In [1]:
import pandas as pd
import numpy as np
import itertools as it
import re
from tqdm.notebook import tqdm
import warnings; warnings.filterwarnings('ignore')

# visualization tools
import seaborn as sns
import matplotlib.pyplot as plt
import plotly as py 
import plotly.graph_objs as go
from plotly.offline import iplot
from wordcloud import WordCloud
import random

# nlp and ml tools
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from spacy.lang.en import English
import string
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
In [2]:
sns.set_context("talk")
sns.set_style('whitegrid', {'xtick.bottom':True, 'ytick.left':True})

Data Import and Cleaning

Data is scraped from Yelp and can be found in a seperate notebook here. Up to 200 restaurants are selected for each of the top 25 counties in the US with the most deaths; per restaurant, up to 100 reviews are scraped.

  a. Functions
  b. Distribution of Restaurants

Return to Contents

In [3]:
df = pd.read_csv('review_data_2020_04_20_09_04.csv', index_col='Unnamed: 0')
df.head()
Out[3]:
alias categories city_x coordinates is_closed location name price rating review_count transactions author city_y description publish_date review_number score
--CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 170 ['delivery'] Sara P. Middlesex, Massachusetts I haven't even left yet. But I am thoroughly p... 2020-01-24 1.0 5.0
--CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 170 ['delivery'] Matt K. Middlesex, Massachusetts I've been to Welly's a few times and each time... 2019-11-12 2.0 4.0
--CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 170 ['delivery'] AnnMarie H. Middlesex, Massachusetts Welly's is a very popular bar right in downtow... 2019-11-08 3.0 4.0
--CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 170 ['delivery'] Christy C. Middlesex, Massachusetts I used to think the idea of fish tacos were no... 2020-04-03 4.0 5.0
--CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 170 ['delivery'] Marissa V. Middlesex, Massachusetts Came here with some work colleagues and ordere... 2020-01-03 5.0 4.0
In [4]:
len(df)
Out[4]:
342034
In [5]:
# fix data types
df['publish_date'] = pd.to_datetime(df['publish_date'])
df = df.reset_index().rename(columns={'index':'restaurant_id'})
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 342034 entries, 0 to 342033
Data columns (total 18 columns):
restaurant_id    342034 non-null object
alias            342034 non-null object
categories       342034 non-null object
city_x           342034 non-null object
coordinates      342034 non-null object
is_closed        342034 non-null bool
location         342034 non-null object
name             342034 non-null object
price            328831 non-null object
rating           342034 non-null float64
review_count     342034 non-null int64
transactions     342034 non-null object
author           342032 non-null object
city_y           342032 non-null object
description      342032 non-null object
publish_date     342032 non-null datetime64[ns]
review_number    342032 non-null float64
score            342032 non-null float64
dtypes: bool(1), datetime64[ns](1), float64(3), int64(1), object(12)
memory usage: 44.7+ MB
In [6]:
# we know some reviews are duplicates because the restaurant falls into multiple cities
df[(df['author']=='Christopher V.') & (df['name']=='Cara Mia')]
Out[6]:
restaurant_id alias categories city_x coordinates is_closed location name price rating review_count transactions author city_y description publish_date review_number score
900 -AuiDPDxsByJ9utiIWYgpg cara-mia-millburn [{'alias': 'italian', 'title': 'Italian'}] Essex, New Jersey {'latitude': 40.7247699, 'longitude': -74.3068... False {'address1': '194 Essex St', 'address2': None,... Cara Mia $$$ 4.0 163 ['delivery'] Christopher V. Union, New Jersey When you walk into Cara Mia, you're greeted li... 2020-03-27 1.0 5.0
1000 -AuiDPDxsByJ9utiIWYgpg cara-mia-millburn [{'alias': 'italian', 'title': 'Italian'}] Union, New Jersey {'latitude': 40.7247699, 'longitude': -74.3068... False {'address1': '194 Essex St', 'address2': None,... Cara Mia $$$ 4.0 163 ['delivery'] Christopher V. Union, New Jersey When you walk into Cara Mia, you're greeted li... 2020-03-27 1.0 5.0
In [7]:
df_clean = df.drop_duplicates(subset=['coordinates', 'name','description'])

Functions

Included here are some bootstrapping functions to be used for determining significance.

Return to Top: Data Import and Cleaning

In [8]:
# functions to run a bootstrap
np.random.seed(47)
def bootstrap_replicate_1d(data, func):
    return func(np.random.choice(data, size=len(data)))

def draw_bs_reps(data, func, size=1):
    """Draw bootstrap replicates."""

    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(size)

    # Generate replicates
    for i in range(size):
        bs_replicates[i] = bootstrap_replicate_1d(data, func)

    return bs_replicates

def bootstrap(groupa, groupb):
    
    mean_diff = np.mean(groupa) - np.mean(groupb)
    
    bs_replicates_a = draw_bs_reps(groupa, np.mean, size=10000)
    bs_replicates_b = draw_bs_reps(groupb, np.mean, size=10000)
    
    bs_diff_replicates = bs_replicates_a - bs_replicates_b
    
    conf_int = np.percentile(bs_diff_replicates, [2.5, 97.5])
    
    # Compute mean of combined data set: combined_mean
    combined_mean = np.mean(np.concatenate([groupa, groupb]))

    # Shift the samples
    shifted_a = groupa - np.mean(groupa) + combined_mean
    shifted_b = groupb - np.mean(groupb) + combined_mean

    # Get bootstrap replicates of shifted data sets
    bs_replicates_a_shifted = draw_bs_reps(shifted_a, np.mean, 10000)
    bs_replicates_b_shifted = draw_bs_reps(shifted_b, np.mean, 10000)

    # Compute replicates of difference of means: bs_diff_replicates
    bs_diff_replicates_shifted = bs_replicates_a_shifted - bs_replicates_b_shifted

    # Compute the p-value for significantly higher values
    p_high = np.sum(bs_diff_replicates_shifted >= mean_diff) / len(bs_diff_replicates_shifted)
   
    # Compute the p-value for significantly lower values
    p_low = np.sum(bs_diff_replicates_shifted <= mean_diff) / len(bs_diff_replicates_shifted)
    
    return mean_diff, conf_int, p_high, p_low

def bootstrap_all(categorical_list, df_name, categorical_variable, test_variable):
    #compare data for each variable in categorical_list to all other data
    
    df_list = []
    for i in tqdm(categorical_list):
        others = df_name[df_name[categorical_variable]!=i].dropna(subset=[test_variable])
        i_var = df_name[df_name[categorical_variable]==i].dropna(subset=[test_variable])
        i_mean = np.mean(i_var[test_variable])
        o_mean = np.mean(others[test_variable])
        meandiff, conf_int, p_high, p_low = bootstrap(i_var[test_variable], others[test_variable])
        df_list.append([i, i_mean, o_mean, meandiff, conf_int, p_high, p_low])

    pval = pd.DataFrame(df_list)
    pval.columns = ['Variable', 'Mean', 'Mean of Others', 'Mean Difference', '95% CI', 'p-value_high', 'p-value_low']
    return pval

Distribution of Restaurants

Here, I examine the geographical distribution of the restaurants used in this analysis.

Return to Top: Data Import and Cleaning

In [9]:
# create subset dataset with only unique businesses, not all reviews
restaurant_df = df_clean[['restaurant_id', 'alias', 'categories', 'city_x', 'coordinates', 
                          'is_closed', 'location','name', 'price', 'rating', 'review_count', 
                          'transactions']].drop_duplicates()

# number of restaurants per location
restaurant_df['city_x'].value_counts()
Out[9]:
Jefferson, Louisiana          200
Oakland, Michigan             200
Wayne, Michigan               200
Macomb, Michigan              200
Middlesex, Massachusetts      200
Suffolk, New York             200
Los Angeles, California       200
Westchester, New York         200
Bergen, New Jersey            200
Cook, Illinois                200
Philadelphia, Pennsylvania    200
New Haven, Connecticut        200
Fairfield, Connecticut        200
Middlesex, New Jersey         200
New York City, New York       200
Hartford, Connecticut         200
Morris, New Jersey            198
Essex, New Jersey             197
Rockland, New York            197
Union, New Jersey             157
Hudson, New Jersey            115
Orleans, Louisiana            112
Passaic, New Jersey            93
King, Washington               72
Nassau, New York               55
Name: city_x, dtype: int64
In [10]:
# convert coordinates into unique columns for mapping
def coordinate_split(coordinates):
    coordinates = re.split(', |: |}', coordinates)
    return coordinates[1], coordinates[3]
restaurant_df[['latitude', 'longitude']] = restaurant_df['coordinates'].apply(coordinate_split).apply(pd.Series)
In [11]:
restaurant_df['text'] = restaurant_df['city_x'] + ' (' + restaurant_df['name'] + ')'
data = go.Scattergeo(lon = restaurant_df['longitude'],
                     lat = restaurant_df['latitude'],
                     text = restaurant_df['text'],
                     mode = 'markers',
                     #marker = dict(symbol='star', size=5, colorscale='Reds')
                     #marker_color = restaurant_df['']
                    )
layout = dict(title = 'Restaurants in Higly COVID-19-Affected Areas',
              geo_scope = 'usa')
choromap = go.Figure(data=[data], layout=layout)
iplot(choromap)
choromap.write_html('plotly_figures/restaurant_distribution.html')

If there are issues viewing this figure, you can access interactive plotly .html files through the plotly_figures.zip folder in the repository.

These areas are highly localized to major US cities. Additionally, because the data was pulled by searching Yelp with just county name, restaurants included are those close to the center of the county.

Additionally, some coordinates are clearly wrong for specific restaurants - one Pizza Hut in Orleans is showing up as in Virginia, and The Mark in New Jersey is shwoing up as in Ohio.

In [12]:
# add a column to denote reviews before and after March 15, when restaurant closures began
df_clean.loc[df['publish_date']>='2020-03-15', 'Post-COVID Lockdown'] = 1
df_clean['Post-COVID Lockdown'] = df_clean['Post-COVID Lockdown'].fillna(0)

# drop rows without a description
df_clean = df_clean[df_clean['description'].notna()]

# we have a limited sample of reviews after lockdown - scraping a longer time-period may help
df_clean['Post-COVID Lockdown'].value_counts()
Out[12]:
0.0    315354
1.0      5038
Name: Post-COVID Lockdown, dtype: int64

Sentiment Analysis

  a. Data Preparation for Sentiment Analysis
  b. Overall Sentiment
        VADER Scores
        VADER Scores v. Review Stars
        Overall Sentiment by Region

  c. Sentiment Pre- and Post-Lockdown
        Review Stars
        Compound VADER Scores
        Positive VADER Scores
        Negative VADER Scores

  d. Sentiment Pre- and Post-Lockdown by Region
        Review Stars by Region
        Compound Scores by Region
        Positive Scores by Region
        Negative Scores by Region

  d. Sentiment Analysis Conclusions

Return to Contents


Data Preparation for Sentiment Analysis

For this part of the sentiment analysis, I'll use NTLK's VADER toolkit, which uses lemmatized text, punctuation, and capitalization to assign a positive, negative, neutral, and compound score to each row. Here, I clean the data as needed for VADER input.

In [13]:
# clean data for sentiment analysis
nlp = spacy.load('en')

# keep punctuation and caps because VADER uses these
# do get rid of stop words & lemmatize

# list of stop words
stop_words = spacy.lang.en.stop_words.STOP_WORDS

def sentiment_tokenizer(review):
    # remove new lines
    review = review.replace('\n', ' ')
    
    mytokens = nlp(review)
    
    # lemmatize, remove spaces
    mytokens = [word.lemma_ if word.lemma_ != '-PRON-' else word.text for word in mytokens if word.is_space==False]
    
    # remove stop words
    mytokens = [word for word in mytokens if word not in stop_words]
    
    # join tokens back into a sentence
    clean_review = ' '.join(mytokens)
    
    return clean_review


sentiment_tokenizer(df_clean.loc[0, 'description'])
Out[13]:
'I leave . I thoroughly pleased dinner choice . huge hockey party like sit want kid like worry . Paul attentive kind . He great suggestion . We Chicken Carbonara Nachos Chilli . OMFG good . chicken tender delightful Nachos phenomenal . You surprise place mess nachos , guy right . Crispy chip , cheesey , delicious fresh tomato boot . My husband I stop finish . room belly , taste bud demand . I stoke new location Hudson open fast ! I stop eat . It good ! I .'
In [14]:
# apply sentiment tokenizer to all reviews
df_clean['sentiment_review'] = df_clean['description'].apply(sentiment_tokenizer)
In [15]:
# add VADER scores as columns based on cleaned up text
sid = SentimentIntensityAnalyzer()
df_clean[['neg', 'neu', 'pos', 'compound']] = df_clean['sentiment_review'].apply(lambda x: sid.polarity_scores(x)).apply(pd.Series)
df_clean.head()
Out[15]:
restaurant_id alias categories city_x coordinates is_closed location name price rating ... description publish_date review_number score Post-COVID Lockdown sentiment_review neg neu pos compound
0 --CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 ... I haven't even left yet. But I am thoroughly p... 2020-01-24 1.0 5.0 0.0 I leave . I thoroughly pleased dinner choice .... 0.128 0.460 0.412 0.9814
1 --CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 ... I've been to Welly's a few times and each time... 2019-11-12 2.0 4.0 0.0 I Welly time time , food incredible ! recently... 0.052 0.733 0.215 0.8745
2 --CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 ... Welly's is a very popular bar right in downtow... 2019-11-08 3.0 4.0 0.0 Welly popular bar right downtown historic Marl... 0.020 0.668 0.312 0.9899
3 --CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 ... I used to think the idea of fish tacos were no... 2020-04-03 4.0 5.0 1.0 I use think idea fish taco appetizing . ... I ... 0.121 0.584 0.296 0.5411
4 --CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 ... Came here with some work colleagues and ordere... 2020-01-03 5.0 4.0 0.0 come work colleague order drink burger . I bur... 0.000 0.699 0.301 0.9281

5 rows × 24 columns

In [16]:
# because text data takes forever to clean, add checkpoints where data is exported to a csv
df_clean.to_csv('checkpoint1.csv')

Overall Sentiment

Evaluate review sentiment across all regions and all times.

Return to Top: Sentiment Analysis

VADER Scores Explained

NTLK's VADER toolkit gives a positive, negative, and neutral score to text data, along with a score combining the three: a compound score. The package authors describe the compound score as follows:

The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate.

-- Source

Positive: Scale of 0 to 1, neutral to positive.
Negative: Scale of 0 to 1, neutral to negative.
Compound: Scale of -1 to 1, negative to positive. Combines postiive, negative, and neutral sentiments to create an overall sentiment metric.

In [17]:
df_clean = pd.read_csv('checkpoint1.csv', index_col='Unnamed: 0')
df_clean.head()
Out[17]:
restaurant_id alias categories city_x coordinates is_closed location name price rating ... description publish_date review_number score Post-COVID Lockdown sentiment_review neg neu pos compound
0 --CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 ... I haven't even left yet. But I am thoroughly p... 2020-01-24 1.0 5.0 0.0 I leave . I thoroughly pleased dinner choice .... 0.128 0.460 0.412 0.9814
1 --CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 ... I've been to Welly's a few times and each time... 2019-11-12 2.0 4.0 0.0 I Welly time time , food incredible ! recently... 0.052 0.733 0.215 0.8745
2 --CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 ... Welly's is a very popular bar right in downtow... 2019-11-08 3.0 4.0 0.0 Welly popular bar right downtown historic Marl... 0.020 0.668 0.312 0.9899
3 --CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 ... I used to think the idea of fish tacos were no... 2020-04-03 4.0 5.0 1.0 I use think idea fish taco appetizing . ... I ... 0.121 0.584 0.296 0.5411
4 --CprxtcUfzKoz29hAzm5w wellys-restaurant-marlborough [{'alias': 'tradamerican', 'title': 'American ... Middlesex, Massachusetts {'latitude': 42.3469, 'longitude': -71.54837} False {'address1': '153 Main St', 'address2': '', 'a... Welly's Restaurant $$ 4.0 ... Came here with some work colleagues and ordere... 2020-01-03 5.0 4.0 0.0 come work colleague order drink burger . I bur... 0.000 0.699 0.301 0.9281

5 rows × 24 columns

Below, I've printed reviews with the highest and lowest compound scores to sanity check.

In [18]:
# show reviews with highest vader compound scores
for idx, row in df_clean[df_clean['compound']==df_clean['compound'].max()].iterrows():
    print('\n---Restaurant Name: {}---'.format(row['name']))
    print('City: {}'.format(row['city_x']))
    print('Review Score: {}'.format(row['score']))
    print('Review Date: {}\n'.format(row['publish_date']))
    print(row['description'])
---Restaurant Name: Vedge---
City: Philadelphia, Pennsylvania
Review Score: 5.0
Review Date: 2017-11-27

We did it. We tried nearly everything on the Vedge menu. Between vegetables being relatively un-filling and us being a relatively ambitious party, we conquered this meal. I hope Vedge is proud of us, because I'm proud of Vedge. Though if I'm being right-sized about this, Vedge is a far superior entity than I am, so I'll tweak that to say I'm "impressed" rather than proud. Our experience here was fabulous, surpassing our already-sky high expectations. I eat almost exclusively vegan and adore vegetables. Vedge, as the name would suggest, truly has mastery over the beautiful, colorful, delicious magic that grows from the ground. The techniques, the flavors, the creativity--it's all there in every dish. This was a special dining experience made all the more special by the top-notch service. There's a reason for the acclaim, and I can now speak to it.  

After what felt like years of buildup, my family made it t here on Wednesday. It's been one of the top restaurants in Philly for a little while now and the fact that my dietary restrictions cause virtually no problems for me here make it even more exciting. Everything here is vegan. Ron Swanson's nightmare but my most glorious fantasy. The space itself is inviting and warm--it feels like it used to be someone's very welcoming and elegant home. 

If it wasn't enough to have a superb food menu, the cocktail menu at Vedge is top-notch too. Between me, my dad and my bf, I got to try three drinks, all of which were yummy. I went with the Apple Catcher (rye, apple cider, ACV, black pepper), which was warming and delicious. Loved the kick from the ACV and pepper. My dad's Black Hole Sun (scotch, burnt miso, maria al monte, charcoal), was decadent but definitely more of a sipper and/or something you just get one of. Rich, thick, complex and original. The Elder Sage (gin, elderflower, lemon, smacked sage) was the least interesting of the bunch but still refreshing and very well-balanced. Fantastic all-around. 

Food-wise, Vedge is pretty much all shared plates. There are three categories to choose from: The "Vedge Bar" (think bar bites), the Dirt List (apps featuring seasonal ingredients) and the Grill (heartier plates that are more long-standing). You're encourage to pick one of three as your full meal. As a party of five, we got 15 dishes, nearly trying everything on the menu. The ones I'll list below are only the ones I personally tried. Overall, the food here is sublime. Vedge showcases vegetables in all of their glory, helping even carnivores the temporary forget their typical preferences. The dishes are not only executed perfectly but they're creative, interesting and varied. A masterpiece of a night. Our eats, below: 

Fancy Radishes w/ smoked tamari, yuzu avocado, pickled tofu, shishito: A+
Served sushi style, four different ways. The smoked tamari and yuzu avocado were my favorites. Creative, beautiful and tasty.

Portobello Carpaccio w/ deviled turnip, caper puree, nigella grissini: A+
Masterful job with this preparation. For me, just as satisfying as beef carpaccio though obviously quite different. Incredible flavors, again magically creative.

Stuffed Avocado w/ romesco, pickled cauliflower, "fried rice," black salt: A
Healthiest stuffed avo I've had for sure. The cauliflower made it feel particularly light. 

Rutabaga Fondue w/ soft pretzel, pickle, charred onion: A+
The fact that this was vegan was mind-blowing to all of us. I'm gluten-free so could only eat the veggies, but the fondue itself was unreal. Truly, it did not seem real. 

Brussels Sprouts, char grilled w/ baby shittake, brussels kimchee: A+
Fun preparation, loved it with the kimchee. Rich and super flavorful.

Pea Leaves, flash seared w/ smoked onion dashi, baked tofu, furikake: A
Light, simple and elegant. Great flavors, not overpowering.

Nebrodini Mushrooms as "fazzoletti", tomato, basil: A+
Vedge can transform mushrooms in a way I've never seen before. The carpaccio was incredible and now we've got pasta-style? 

Smoked Potato "kedgeree" w/ curry rice, tofu vindaloo: A+
Newer flavors added to the mix here. Interesting preparation, great flavors, very graceful. 

Wood-Roasted Carrot w/ pumpernickel, garbanzos, carrot mustard, carrot kraut: A++
One of the signature dish's here, and for good reason. The garbanzos and carrot mustard were divine and the carrot was likely the best carrot I've ever had.

Ssamjang-Glazed Tofu w/ edamame puree, burnt miso, cucumber, sea beans, toasted nori: A
Showcasing mastery over styles here, another perfectly-executed dish. Miso/edamame were delicious and sophisticated accompaniments.

Grilled Chioggia Beet w/ porcini steak sauce, black lentils, kohlrabi, truffle: A+
Rich, warming and hearty, perfectly seasoned and fall-centric. A group fav.

Seared Maitake Mushroom w/ celery root fritter, smoked leek remoulade: A+
Each component, though separated, worked to create an earthy, cohesive whole. Ending on root veggies made for satisfying, on-season finale.

---Restaurant Name: Serpico---
City: Philadelphia, Pennsylvania
Review Score: 5.0
Review Date: 2018-07-31

It was exciting to try this place in Philadelphia. With one night in town for a special meal, I was happy to still get a reservation here, since Zahav was booked up.

The menu is super-ambitious and was unusual, and the food is gorgeous..  I didn't love every dish equally but this was a terrific experience and a lot of fun. 

COOL STUFF:
--super-chill neighborhood atmosphere.  Very unusual for a James Beard-winning chef's restaurant.  Small, super friendly and professional maitresse d' (who also gave great wine recommendations).

--floral cocktail was nice, as Yelpers had suggested, but it was a rainy night so it was maybe not the best choice; it wasn't spectacular and it was pretty light; I wish I had tried a bourbon cocktail instead

--wine list is well-curated, only 4 glasses of red. I was not impressed with the cab (they let me try a sip), but the temperanillo was perfectly respectable and went well with the food

--restaurant is super low-carb, heavy on vegetables and proteins. I think they only carb I had besides dessert was the quinoa/flax (I think) crisps that come in place of bread. That came with a really lovely smoked black bean (I think) spread or a really yummy kind of butter (I forget what kind). YUM.

MEDIUM STUFF:
--roasted beet dish was beautiful, but I liked it more visually; the flavor was less exciting for me (and I do love beets)

--ditto with the melon salad; others had it with the dehydrated ham sprinkles but mine was without it as a veg.; maybe it would have been more exciting with it; I wish the server had not suggested it was a great dish even without the ham

--king oyster mushroom: my friends liked this more than I did; it was interesting but not really memorable 

COOLER STUFF
--I really liked the eggplant dish. Yum. Too bad I was sharing with my table - I would have liked this all to myself instead of having part of the beets, melon and king oyster.

REALLY COOL STUFF
--Halibut with carrot crust: Super yum. If I had just ordered this and the eggplant I could have died happy. Again, if I wasn't sharing I could have had more than 2 bites of it. Mmmmm......

--Enoki mushrooms with pecorino broth and parmesan: YUM. This was like spaghetti cacio e pepe except with enoki mushrooms. I could eat this all day, every day. Super delicious and special. 

Ahh....dessert.  In a restaurant like this where the vegetables and proteins are the stars (just check out the photos) I was not expecting dessert to be anything special. But since I'd had only a bite of my faves and some dishes were not as awesome, and there were no heavy carbs, I was still very hungry at dessert. To the rescue: the super badass apple cake with caramel and vanilla ice cream. PERFECT dessert, excellent portion - and my friends were full so I did not have to share. Ahh.....

AND THE BEST OF ALL
--The place has an open kitchen, so we had been able to watch Peter Serpico and his team of line cooks work efficiently and precisely all evening. And since we closed the place down (slow weeknight), we got to watch the magic of it being cleaned and closed down, and for the staff to start hanging out at empty tables. Chef Serpico himself stopped by our table as he shifted from kitchen-work to hang-out to see how we had liked the meal.  Very cool guy - it was a pleasure to be able to eat in his very impressive restaurant, even if I wish I had not been sharing so I could have had my fave dishes all to myself! 

Next time, I would do: ENOKI MUSHROOMS, EGGPLANT, and HALIBUT, - with APPLE CAKE for dessert (or maybe I'd skip the eggplant if I was not super hungry). That would be an A+ extraordinary meal. That plus the neighborhood vibe makes this a definite 5 stars even if I didn't love every single dish.

---Restaurant Name: jeong---
City: Cook, Illinois
Review Score: 4.0
Review Date: 2019-03-04

Honest review: i love korean food n this by far is the best korean food in Chicago a lot been then parachute. Ambiance: its gorgeous inside perfect for date nights or to chill with friends. Service: its amazing staff is attentive, change plates, describe, things supper nice. Food: food is really good def will get a michelin star just needs a few tweaks. One thing since they are going for fancy korean they need to make everything look fancy n make fancy dishes felt some menu items shouldnt be on the menu like the duck confit? Duck breast is better n like the odeng more could have been done to elevate the dish.  Alot of dishes need something crunchy. Would have loved to see cool ingredients like foie gras truffle caviar n sweetbreads be unique do something different push the envelope. A take home dessert would be perfect like it was my gfs birthday n they gave us take home cupcakes n they were amazing. Another thing the desserts werent anything special. a lot of people have a sweet tooth so making amazing desserts is very important, so if they dont have a pastry chef they should get one. Thought the tasting menu tasted better then the al la cart menu so they need to fix that. Drinks were good but wish they had more of a wine selection sweet desserts are important n more sake plz. Menu needs more refinement great start just keep pushing yourself to be better best things: silken tofu, mackerel/ salmon/ scallop/duck/steak/ doubuki/ gingerbread dessert

Silken tofu: delicous tastes like that ginger salad at habachi restaurants. The tofu was so soft. N the crab was really sweet n went perfectly with the greenery just add texture. 

Mackerel: delicous supper complex sauce was spicy/ sweet. Love the fried rice for crunch n the matcha for bitterness. Only issue Needs a  different fish cause mackerel is to fishy raw scallops would have been perfect 

Salmon: was amazing loved all the different flavors n textures. I usually hate dairy with fish but it adds richness to the dish my only complaint is that it is 
a lil salty

Odeng: 3/5 Meatballs were moist just wish they were crispier n had more flavor inside. Aioli was amazing just it needs more. put more radishes they were good it cut through the richness. just dish needed more components n needed to be refined more

Scallop dish was amazing. Scallop was so tender. The  supper complex sauce was so fucking good they need to bottle it. Loved the spinach just put more. Dish needed texture

Mandu: didnt like werent flavorful tasted like any ordinary dumpling ur used too. The other components didnt really add anything. 

Duck: was amazing cooked perfectly n was actually crispy my only complaint it was such a tiny portion. sauce was sweet n savory kimichi was amazing just put more rice was meh wish it had more flavor. 

Tteokbokki: was amazing perfect combo of sweet n spicy. Loved the egg for richness. Wish this had fishcakes only complaint needs texture

Steak:  was amazing. Was flavorful just needed to be cooked longer. The coquet was delicious. The sauce was define n there was a lot of it. Loved the puree adding more richness. No complaints 


duck confit. was gross it tasted weird like tuna. Texturally it was off putting looked n ate like
Spam.  The other components didnt really add anything 

Ginger bread. was amazing the granita was sour n went perfectly with the white sauce. The gingerbread was crunchy n flavorful no complaints

Chestnut: this was alright. Cake was dry cream on top was good. The chocolate on sauce didnt add anything wasnt sweet enough. I love chestnut desserts but this needs updating components didnt work together n cake wasn't the best

Chocolate cake: good not great it was a good cake but nothing special. Wasnt moist liked the icecream n the crumble though. The marshmallow whip was to sticky n hard was unpleasant to eat


rice. the mochi made the dessert very sweet n was awkwardly textured.  If u ate everything else but the mochi it was fine but nothing special
In [19]:
# show reviews with lowest vader compound scores
for idx, row in df_clean[df_clean['compound']==df_clean['compound'].min()].iterrows():
    print('\n---Restaurant Name: {}---'.format(row['name']))
    print('City: {}'.format(row['city_x']))
    print('Review Score: {}'.format(row['score']))
    print('Review Date: {}\n'.format(row['publish_date']))
    print(row['description'])
---Restaurant Name: ReBAR & kitchen---
City: Bergen, New Jersey
Review Score: 1.0
Review Date: 2020-02-14

UPDATE: KNOW YOUR MONEY IS GOING TO A RACIST!

I've had to soak on this for awhile, after witnessing what I did, but after reading the other reviews, hearing the horror stories from staff, and hearing the crap straight from the horse's mouth... It's safe to say REBAR is dead to me, and I will no longer be spending my hard earned money, nor referring countless people from the area, from work or even as far as NY, to spend their hard earned money either at REBAR.

So why the change of heart? One word... Racism. Yep, you read that right... The one ugly term we hope to never confront, and maybe we're to busy with ME TOO, Social Justice, and the whole impeachment thing, that we've forgotten about RACISM... Because we never think it would rear its ugly head in our backyard. The owner is a flat out racist, and I've heard the rumors, I've heard the whispers, and I've even witnessed the racist humor --- the latter sometimes and unfortunately WE all forgive, because racist humor pokes fun at EVERYONE, no one is safe and we hide behind the comedy aspect of it... But when you are a racist and approach your customers with racism well that's simply just not right, it's not cool, and I won't stand for it.

Last week, the owner, GERARD (I don't know who this Jay K. or P. or whatever is on Yelp --- the owner's name is Jerard / GERARD) walked in buzzed / tipsy --- I've seen this a handful of times before, but that night he wanted to be extra funny and I guess he gave no f*cks. The gentleman parallel to me was wearing a hoody, and Gerard the owner pulled it off of him. They got into a heated exchange and the sentiment he was outpouring was the same that most patrons complain about - Lack of respect. They patched it up, and it seemed they knew each other and they peaced it out, but it seemed at one point it was ready to go to blows - the hooded gent was simply tired of Gerard's shit. Afterwards he offered to buy him a drink and Gerard said "I only drink with WHITE PEOPLE".... ok, shitty joke you might think, but then it's followed up with - "actually you'll pass, as long as you're not a NIGGER OR A SPIC" --- I mean WTF!!!??? The holy fuck!? Just before this, this prick shook my hand... I've been going to this place for 5+ years, and I've witnessed him be nasty to his staff, micro manage the shit out of the place, and just constantly be negative and complain about everything.... I've walked in after an event where he screamed and demeaned a patron because he gave the place a negative review and refused to sit him and his guest... I've seen it all, and I've taken my breaks from the place, and I've even had amazing nights where he doesn't even enter the bar - those are the best nights, even though he waits in his Jeep Cherokee watching everyone's move from his car and phone app he uses to watch the bar.... 

He's never disrespected me to my face, or offended me or my guests, and that's maybe because I try to keep my engagement  with him at a minimum... but to witness the BS with others, and finally literally next to me hear him say such an offensive thing, and then throw racial slurs like nothing at talking volume and not give AF who was there cause he was tipsy, and then have his staff write it off as "oh he must be drunk heh heh"... so that means this happens often??

To see this older man act this way, still try and get laid at 55-60 years old, be non-stop miserable and just be a rotten human being to his patrons, staff and now add RACIST - I mean enough is enough , I don't wanna be a part of this place, I don't want to give my money to this place, and I don't want to be any part of the reason rebar keeps its doors open --- ADD TO ALL OF THIS CRAPPY CRAP BULLSHIT that this owner is about and the culture he's created, the man keeps raising his dive bar food again again and again! Yes, we appreciate organic , fresh ingredients and amazing food, BUT remember, this is a friggin dive bar in random Lodi --- $2 one month, $2 the next... in a year alone the prices have gone up twice or more. Say goodbye to the $10 burger... $14 and up for every burger now - for that I'll go to Zin Burger.... AND THEN YOU WANT TO CHARGE 3.9% for every credit card transaction? Lol lol lol this is nuts! 

This is a man that LITERALLY FIGHTS you for leaving a negative review, but take your time to read the negative reviews, they come from people of color - I now know this is not a coincidence. I've seen how they've been made to feel unwelcome at the bar, I've seen his refusal to turn off Fox News, I've heard his pig headed opinions, and now I've seen the racism up front and direct. SO GOODBYE REBAR! I can't wait to celebrate your demise....

"JAY K." / GERARD don't bother responding to this post... You know all of what I am writing is true, and for once, act your age, act like a boss, and accept the hard and true criticism and reality... F*CK THIS PLACE!!!

Relationship of VADER Scores and Review Stars

Because Yelp reviews are accompanied by a 1-5 star ranking, we can compare compound, positive, and negative VADER scores to determine how they correlate with self-assigned rankings.

Return to Top: Overall Sentiment

In [20]:
# plot distribution of compound scores by review score

fig, axes = plt.subplots(5, 1, figsize=(10, 15))

sns.distplot(df_clean.loc[df_clean['score']==1,'compound'], hist=False, color='tab:red', label='1 Star', ax=axes[0])
sns.distplot(df_clean.loc[df_clean['score']==2,'compound'], hist=False, color='tab:orange', label='2 Stars', ax=axes[1])
sns.distplot(df_clean.loc[df_clean['score']==3,'compound'], hist=False, color='tab:olive', label='3 Stars', ax=axes[2])
sns.distplot(df_clean.loc[df_clean['score']==4,'compound'], hist=False, color='tab:green', label='4 Stars', ax=axes[3])
sns.distplot(df_clean.loc[df_clean['score']==5,'compound'], hist=False, color='tab:blue', label='5 Stars' ,ax=axes[4])
axes[0].set_title('Compound VADER Score by Review Stars')
for ax in range(4):
    axes[ax].set_xlabel('')
for ax in range(5):
    axes[ax].set_xlim(-1,1)
axes[4].set_xlabel('Compound VADER Score')
plt.tight_layout()

Scores of 1 and 2 are more likely to have low compound VADER scores, as expected; scores of 3 or more have highly positive VADER scores.

In [21]:
# plot distribution of positive scores by review score

fig, axes = plt.subplots(5, 1, figsize=(10, 15))

sns.distplot(df_clean.loc[df_clean['score']==1,'pos'], hist=False, color='tab:red', label='1 Star', ax=axes[0])
sns.distplot(df_clean.loc[df_clean['score']==2,'pos'], hist=False, color='tab:orange', label='2 Stars', ax=axes[1])
sns.distplot(df_clean.loc[df_clean['score']==3,'pos'], hist=False, color='tab:olive', label='3 Stars', ax=axes[2])
sns.distplot(df_clean.loc[df_clean['score']==4,'pos'], hist=False, color='tab:green', label='4 Stars', ax=axes[3])
sns.distplot(df_clean.loc[df_clean['score']==5,'pos'], hist=False, color='tab:blue', label='5 Stars' ,ax=axes[4])
axes[0].set_title('Positive VADER Score by Review Stars')
for ax in range(4):
    axes[ax].set_xlabel('')
axes[4].set_xlabel('Positive VADER Score')
for ax in range(5):
    axes[ax].set_xlim(0,1)
plt.tight_layout()

There is a clear right shift as stars increase in the positive VADER score, as expected. Of note, there are peaks at 0: these are reviews that have no positive terms. My guess is that these are peaks due to sample size - one positive word dramatically affects the positive score in reviews that are relatively short.

In [22]:
df_clean.loc[df_clean['pos']==0, ['description', 'neg', 'neu', 'pos', 'compound']].head()
Out[22]:
description neg neu pos compound
269 There goes the diet! Gonna have to to go ther... 0.000 1.000 0.0 0.0000
412 Thick crust pizza, if that's your taste (not m... 0.000 1.000 0.0 0.0000
606 Worst diner ever. Gyro wrap was awful. Served ... 0.394 0.606 0.0 -0.7579
612 Basic Diner food, average prices. 0.000 1.000 0.0 0.0000
663 Cold food was delivered to the table. Hair in ... 0.171 0.829 0.0 -0.4767
In [23]:
# plot distribution of negative scores by review score

fig, axes = plt.subplots(5, 1, figsize=(10, 15))

sns.distplot(df_clean.loc[df_clean['score']==1,'neg'], hist=False, color='tab:red', label='1 Star', ax=axes[0])
sns.distplot(df_clean.loc[df_clean['score']==2,'neg'], hist=False, color='tab:orange', label='2 Stars', ax=axes[1])
sns.distplot(df_clean.loc[df_clean['score']==3,'neg'], hist=False, color='tab:olive', label='3 Stars', ax=axes[2])
sns.distplot(df_clean.loc[df_clean['score']==4,'neg'], hist=False, color='tab:green', label='4 Stars', ax=axes[3])
sns.distplot(df_clean.loc[df_clean['score']==5,'neg'], hist=False, color='tab:blue', label='5 Stars' ,ax=axes[4])
axes[0].set_title('Negative VADER Score by Review Stars')
for ax in range(4):
    axes[ax].set_xlabel('')
for ax in range(5):
    axes[ax].set_xlim(0,1)
axes[4].set_xlabel('Negative VADER Score')
plt.tight_layout()

With negative scores, we see an increase in neutral sentiment for 2-4 stars, as expected. The highest negative scores (recall, higher negative scores are worse, not more positive) can be observed with lower stars.

Again, we see odd peaks at 0, and those can also be explained by reviews that contain no negative terms.

In [24]:
df_clean.loc[df_clean['neg']==0, ['description', 'neg', 'neu', 'pos', 'compound']].head()
Out[24]:
description neg neu pos compound
4 Came here with some work colleagues and ordere... 0.0 0.699 0.301 0.9281
8 Very cute and cozy pub spot. Perfect for a cas... 0.0 0.489 0.511 0.9854
11 Excellent food, service and atmosphere!\nAweso... 0.0 0.346 0.654 0.9632
15 I had the chicken parmigiana. It was great and... 0.0 0.350 0.650 0.9186
17 Welly's is probably one of the busiest restaur... 0.0 0.509 0.491 0.9776

Generally, the VADER scores correlate with review stars, which confirms that sentiment can be determined using VADER.

Overall Sentiment by Region

Here, I evaluate overall sentiment by region, regardless of timeframe.

Return to Top: Overall Sentiment

Compound VADER Score - Overall by Region
In [25]:
all_county_compound_bootstrap = bootstrap_all(df_clean['city_x'].unique(), df_clean, 'city_x', 'compound')

In [26]:
# counties with significantly higher compound scores overall
all_county_compound_bootstrap[all_county_compound_bootstrap['p-value_high']<0.05].sort_values('Mean', ascending=False)
Out[26]:
Variable Mean Mean of Others Mean Difference 95% CI p-value_high p-value_low
1 Philadelphia, Pennsylvania 0.873979 0.788811 0.085168 [0.08077785041900973, 0.08952961915999184] 0.0000 1.0000
14 New York City, New York 0.853246 0.789944 0.063302 [0.05861237384448992, 0.06807079922183527] 0.0000 1.0000
16 Los Angeles, California 0.853244 0.789900 0.063344 [0.0585651897584104, 0.0680733043902514] 0.0000 1.0000
22 Cook, Illinois 0.843622 0.790768 0.052854 [0.047854348069286956, 0.05782488460563287] 0.0000 1.0000
3 Jefferson, Louisiana 0.839935 0.790932 0.049003 [0.044166633825323494, 0.05389760245786788] 0.0000 1.0000
2 Hartford, Connecticut 0.818161 0.792612 0.025549 [0.019364172046849225, 0.031567327151138164] 0.0000 1.0000
6 Hudson, New Jersey 0.813403 0.793135 0.020268 [0.01261969211589285, 0.02791421681859141] 0.0000 1.0000
10 Essex, New Jersey 0.810269 0.792893 0.017376 [0.01139138141654953, 0.023239607027634845] 0.0000 1.0000
19 Bergen, New Jersey 0.809339 0.792878 0.016461 [0.01084991702520183, 0.022103801972065327] 0.0000 1.0000
11 New Haven, Connecticut 0.803308 0.793265 0.010043 [0.003944477673485852, 0.016054235207837247] 0.0001 0.9999
In [27]:
# counties with significantly lower compound scores overall
all_county_compound_bootstrap[all_county_compound_bootstrap['p-value_low']<0.05].sort_values('Mean')
Out[27]:
Variable Mean Mean of Others Mean Difference 95% CI p-value_high p-value_low
18 Orleans, Louisiana 0.602798 0.795228 -0.192430 [-0.21422975363930313, -0.17065864630597327] 1.0000 0.0000
23 King, Washington 0.667522 0.795621 -0.128099 [-0.14261460972738235, -0.11342700897276661] 1.0000 0.0000
4 Nassau, New York 0.687086 0.794226 -0.107140 [-0.133010361358894, -0.08194399420302245] 1.0000 0.0000
7 Suffolk, New York 0.694652 0.796718 -0.102066 [-0.11197343766327063, -0.09195154922360023] 1.0000 0.0000
0 Middlesex, Massachusetts 0.714603 0.797328 -0.082725 [-0.09071223531785108, -0.0748992450679393] 1.0000 0.0000
8 Macomb, Michigan 0.733411 0.796187 -0.062776 [-0.07090089502375414, -0.05468129085249797] 1.0000 0.0000
12 Oakland, Michigan 0.735575 0.795732 -0.060157 [-0.06889297509805985, -0.051375903481605985] 1.0000 0.0000
21 Wayne, Michigan 0.736850 0.796009 -0.059160 [-0.0670723902438475, -0.05130345585855289] 1.0000 0.0000
15 Fairfield, Connecticut 0.758847 0.795243 -0.036397 [-0.04403464923824978, -0.028956047749932653] 1.0000 0.0000
20 Passaic, New Jersey 0.759222 0.794390 -0.035168 [-0.04646129610259268, -0.024246784310107537] 1.0000 0.0000
24 Union, New Jersey 0.774520 0.794464 -0.019944 [-0.027549210612551465, -0.012580126201735653] 1.0000 0.0000
5 Rockland, New York 0.774814 0.794436 -0.019622 [-0.027137285768681296, -0.01191886903691316] 1.0000 0.0000
9 Westchester, New York 0.775242 0.794652 -0.019410 [-0.026320059391107428, -0.01276297432439048] 1.0000 0.0000
17 Morris, New Jersey 0.786367 0.794155 -0.007788 [-0.01408926873120791, -0.001622453830824444] 0.9931 0.0069
In [28]:
compound_restaurant_plot = restaurant_df.merge(df_clean.groupby('restaurant_id').mean()['compound'], left_on='restaurant_id', right_index=True) 
compound_restaurant_plot['text'] = compound_restaurant_plot['city_x'] + ' (' + compound_restaurant_plot['name'] + '): ' + round(compound_restaurant_plot['compound'],2).astype(str)
data = go.Scattergeo(lon = compound_restaurant_plot['longitude'],
                     lat = compound_restaurant_plot['latitude'],
                     text = compound_restaurant_plot['text'],
                     mode = 'markers',
                     marker_color = compound_restaurant_plot['compound'],
                    )
layout = dict(title = 'Distribution of Restaurants with Compound Score Variation Shown',
              geo_scope = 'usa')
choromap = go.Figure(data=[data], layout=layout)
iplot(choromap)
choromap.write_html('plotly_figures/restaurant_distribution_compound.html')

While most restaurants have high compound scores of 0.7-1 (green), restaurants with lower mean scores are shown in pink/red. Some of the regions with significantly higher mean compound scores include Philadelphia, Pennsylvania; New York City, New York; and Los Angeles, California. Some of the regions with significantly lower compound scores include Orleans, Louisiana; King, Washington; and Nassau, New York.

Positive VADER Score - Overall by Region
In [29]:
all_county_pos_bootstrap = bootstrap_all(df_clean['city_x'].unique(), df_clean, 'city_x', 'pos')

In [30]:
# counties with significantly higher positive scores overall
all_county_pos_bootstrap[all_county_pos_bootstrap['p-value_high']<0.05].sort_values('Mean', ascending=False)
Out[30]:
Variable Mean Mean of Others Mean Difference 95% CI p-value_high p-value_low
5 Rockland, New York 0.396398 0.372187 0.024211 [0.021062218492274253, 0.02728893093134454] 0.0000 1.0000
19 Bergen, New Jersey 0.384771 0.372365 0.012406 [0.010076199883068685, 0.01473888238337561] 0.0000 1.0000
15 Fairfield, Connecticut 0.382265 0.372632 0.009633 [0.006833570771595051, 0.012473774770293294] 0.0000 1.0000
9 Westchester, New York 0.382231 0.372580 0.009650 [0.007076921606878435, 0.012255351534690737] 0.0000 1.0000
10 Essex, New Jersey 0.380363 0.372643 0.007720 [0.005296228258989604, 0.010076122581355401] 0.0000 1.0000
20 Passaic, New Jersey 0.380208 0.372894 0.007314 [0.003087580440167945, 0.011582603644103818] 0.0003 0.9997
12 Oakland, Michigan 0.376679 0.372902 0.003777 [0.0005384553530575917, 0.006934408479183607] 0.0095 0.9905
17 Morris, New Jersey 0.376027 0.372864 0.003164 [0.0006624711495997805, 0.0056089400364047175] 0.0049 0.9951
24 Union, New Jersey 0.375808 0.372924 0.002884 [-5.043839661320875e-05, 0.005755536421096631] 0.0265 0.9735
In [31]:
pos_restaurant_plot = restaurant_df.merge(df_clean.groupby('restaurant_id').mean()['pos'], left_on='restaurant_id', right_index=True) 
pos_restaurant_plot['text'] = pos_restaurant_plot['city_x'] + ' (' + pos_restaurant_plot['name'] + '): ' + round(pos_restaurant_plot['pos'],2).astype(str)
data = go.Scattergeo(lon = pos_restaurant_plot['longitude'],
                     lat = pos_restaurant_plot['latitude'],
                     text = pos_restaurant_plot['text'],
                     mode = 'markers',
                     marker_color = pos_restaurant_plot['pos'],
                    )
layout = dict(title = 'Distribution of Restaurants with Positive Score Variation Shown',
              geo_scope = 'usa')
choromap = go.Figure(data=[data], layout=layout)
iplot(choromap)
choromap.write_html('plotly_figures/restaurant_distribution_positive.html')

Positive sentiment by region tends to range, on average, from 0.32-0.4. Regions with significantly higher positive scores include Rockland, New York; Bergen, New Jersey; and Fairfield, Connecticut. Significantly lower positive scores are excluded since lower scores indicate neutrality, not negativity.

Negative VADER Score - Overall by Region
In [32]:
all_county_neg_bootstrap = bootstrap_all(df_clean['city_x'].unique(), df_clean, 'city_x', 'neg')

In [33]:
# counties with significantly higher negative scores overall
all_county_neg_bootstrap[all_county_neg_bootstrap['p-value_high']<0.05]
Out[33]:
Variable Mean Mean of Others Mean Difference 95% CI p-value_high p-value_low
0 Middlesex, Massachusetts 0.065436 0.055436 0.010000 [0.008548087503426303, 0.011464713886861] 0.0000 1.0000
4 Nassau, New York 0.069143 0.055809 0.013334 [0.008736076062500531, 0.017944556222490516] 0.0000 1.0000
7 Suffolk, New York 0.071529 0.055399 0.016130 [0.014230156505079495, 0.01806627426976646] 0.0000 1.0000
8 Macomb, Michigan 0.064661 0.055513 0.009148 [0.0076507493439311805, 0.01061105918547933] 0.0000 1.0000
9 Westchester, New York 0.057174 0.055804 0.001370 [8.90880580130453e-05, 0.00265157368245984] 0.0197 0.9803
12 Oakland, Michigan 0.063670 0.055602 0.008068 [0.00638150183637689, 0.00974267482786758] 0.0000 1.0000
15 Fairfield, Connecticut 0.059755 0.055702 0.004053 [0.002614971469060173, 0.005495127774840707] 0.0000 1.0000
18 Orleans, Louisiana 0.081527 0.055670 0.025857 [0.021869652481574634, 0.03000164622114672] 0.0000 1.0000
20 Passaic, New Jersey 0.058200 0.055825 0.002375 [0.0002686794613891199, 0.004489403234177783] 0.0149 0.9851
21 Wayne, Michigan 0.063186 0.055578 0.007608 [0.0060559926917489095, 0.00913404724295446] 0.0000 1.0000
23 King, Washington 0.075158 0.055582 0.019576 [0.016839625872215644, 0.022325236240131423] 0.0000 1.0000
In [34]:
neg_restaurant_plot = restaurant_df.merge(df_clean.groupby('restaurant_id').mean()['neg'], left_on='restaurant_id', right_index=True) 
neg_restaurant_plot['text'] = neg_restaurant_plot['city_x'] + ' (' + neg_restaurant_plot['name'] + '): ' + round(neg_restaurant_plot['neg'],2).astype(str)
data = go.Scattergeo(lon = neg_restaurant_plot['longitude'],
                     lat = neg_restaurant_plot['latitude'],
                     text = neg_restaurant_plot['text'],
                     mode = 'markers',
                     marker_color = neg_restaurant_plot['neg'],
                    )
layout = dict(title = 'Distribution of Restaurants with Negative Score Variation Shown',
              geo_scope = 'usa')
choromap = go.Figure(data=[data], layout=layout)
iplot(choromap)
choromap.write_html('plotly_figures/restaurant_distribution_negative.html')

Negative sentiment by region tends to range, on average, from 0.04-0.08. Regions with significantly higher negative VADER scores include Middlesex, Massachusetts; Nassau, New York; and Suffolk, New York. Significantly lower negative scores are included since lower scores indicate neutrality, not positivity.

Overall Sentiment Conclusions

While there are trends by region, e.g., the overall most positive reviews occur in Philadelphia and New York and the most negative reviews occur in New Orleans and Seattle, zooming in on specific regions shows variation at the restaurant level, making it difficult to draw conclusions about trends in specific regions.


Sentiment Pre- and Post-Lockdown

How have the distribution of review stars and compound, positive, and negative VADER scores changed before and after lockdown?

Here, I examine how review stars and VADER scores differ before and after lockdown, where lockdown is defined as March 15, 2020.

Return to Top: Sentiment Analysis

In [35]:
# create subset dfs based on covid lockdown
post_covid = df_clean[df_clean['Post-COVID Lockdown']==1]
pre_covid = df_clean[df_clean['Post-COVID Lockdown']==0]

Review Stars

Yelp reviews require the customer to assign a 1-5 star rating, 1 being the worst and 5 being the best.

Return to Top: Sentiment Pre- and Post-Lockdown

In [36]:
plt.figure(figsize=(8,5))
plt.bar(pre_covid['score'].value_counts(normalize=True).sort_index().index, 
        pre_covid['score'].value_counts(normalize=True).sort_index(),
        alpha=0.5, color='palevioletred', width=1, label='Pre-Lockdown')
plt.bar(post_covid['score'].value_counts(normalize=True).sort_index().index, 
        post_covid['score'].value_counts(normalize=True).sort_index(),
        alpha=0.5, color='mediumseagreen', width=1, label='Post-Lockdown')
plt.title('Distribution of Review Stars')
plt.xlabel('Review Stars')
plt.ylabel('Proportion of Reviews')
plt.legend()
plt.tight_layout()

We can see that the proporiton of 5-star reviews has increased post-lockdown; however, the proportion of 1-star reviews appears to be about the same, and the proportion of neutral reviews has decreased.

In [37]:
# random sample of 5-star reviews post-lockdown
for idx, row in post_covid[post_covid['score']==5].sample(5).iterrows():
    print('\n---Restaurant Name: {}---'.format(row['name']))
    print('City: {}'.format(row['city_x']))
    print('Review Stars: {}'.format(row['score']))
    print('Review Date: {}\n'.format(row['publish_date']))
    print(row['description'])
---Restaurant Name: Mecha Noodle Bar---
City: Fairfield, Connecticut
Review Stars: 5.0
Review Date: 2020-03-18

Love it.  There must be a million calories in these dishes - but you only live once.  Crowds are amazing because its so good and so you'll find yourself eating somewhat communally depending on the size of your group - but it'll be so good you won't care - and its fun (now lets just pray that Corona goes away fast....) Pray.

---Restaurant Name: Hometown Bar-B-Que---
City: New York City, New York
Review Stars: 5.0
Review Date: 2020-04-11

Made our way to Fairway and Hometown BBQ today.  Talk about killing two birds with one stone.  Absolutely fantastic.  If there is a better BBQ joint in NYC, somebody kindly let me know.  

P.S.  What a pathetic review by Sen-Pei.  You should be ashamed of yourself.  Really.  Restaurants and their employees are suffering enough and you decide to post a one star review during a pandemic AND race bait on top of it?

---Restaurant Name: Sandra's Next Generation---
City: New Haven, Connecticut
Review Stars: 5.0
Review Date: 2020-03-15

I have passed by this place over 1000 times, but have Never been inside to eat. My first time going in and eating was fantastic. OMG!  It was crowded but the staff handled the flow of traffic perfectly. The place smelled so good. We were seated within 25 minutes. The service was excellent and our meals were Delicious. Even the little 10 year old (staff person in training) was very attentive and helpful. We had a great experience AND we got treated to some extra delicious cornbread to go.

---Restaurant Name: Shell & Bones Oyster Bar & Grill---
City: New Haven, Connecticut
Review Stars: 5.0
Review Date: 2020-04-17

Great News!
After a short intermission Shell and Bones Oyster Bar &amp; Grill has started Curbside pickups and Local Delivery.
Starting 4/17/20  &amp;. 4/18/20 3:00pm-9:00pm with a menu of their "greatest hits "!
Check out their website for more information and menu.
Here's wishing "theoysterer" and crew success!

---Restaurant Name: Da Nina's Italian Restaurant---
City: Rockland, New York
Review Stars: 5.0
Review Date: 2020-04-04

Ordered takeout and had a delicious dinner delivered. Food is just as great as when you went there to eat it. I wish for safety and health for all those still working there. Looking forward to dining in again, but this was a great substitute for the time being!


Delicious Italian food! As a Vegan, it's hard to find a place that takes the time to look at their menu and create something for you. Our server, Abel, was fantastic. He suggested the Gnocchi dish with plain tomato sauce (the menu version comes prepared with meat and cheese). I didn't want to come up with a bunch of tweaks to a menu item, ask a ton of questions and be afraid that they'd take me seriously. I was ready to order 4 sides to create a meal, but Abel's suggestion made my dinner special. (Speaking of sides, their spinach dish is the best I've ever had. Anywhere. The spinach itself has a grilled flavor, and is sautéed with olive oil and garlic. Their sautéed hot peppers were sooo good too- they're hot temperature-wise and heat-wise. Not recommended for) I dined with a non-Vegan, and he loved his food too. The restaurant is cozy and warm, and the staff is wonderful and welcoming. Can't wait to go back!
In [38]:
# sample of 1-star reviews post-lockdown
for idx, row in post_covid[post_covid['score']==1].sample(5).iterrows():
    print('\n---Restaurant Name: {}---'.format(row['name']))
    print('City: {}'.format(row['city_x']))
    print('Review Stars: {}'.format(row['score']))
    print('Review Date: {}\n'.format(row['publish_date']))
    print(row['description'])
---Restaurant Name: Qing Xiang Yuan Dumplings---
City: Cook, Illinois
Review Stars: 1.0
Review Date: 2020-03-16

My post got removed because it was deemed as wanting attention. I only spoke of my experience and there about how they discriminated against me as an Asian. Only because I was wearing a face mask. It was poor customer service on the staff and it should be known. 

There has been other incidents where Asians were forced to leave or refused entry by the manager. This is because they were also wearing face masks. This is a case of discrimination due to recent events and it should not be tolerated.

---Restaurant Name: Mighty Quinn's Barbeque---
City: Essex, New Jersey
Review Stars: 1.0
Review Date: 2020-03-22

Got barbecue for take out got home to a bowl of pure lard...I eat bbq all the time but this was a discraceful, sad pile of lard...

---Restaurant Name: Nonna's---
City: Morris, New Jersey
Review Stars: 1.0
Review Date: 2020-04-12

Ordered an Italian sub and the bread was stale I could barely bite through. Also it's kind of small for the price.

---Restaurant Name: Texas Roadhouse---
City: New Haven, Connecticut
Review Stars: 1.0
Review Date: 2020-04-19

We ordered our food at around 6:30. We noticed that people who orders after us were getting their food before us. When I called them up while we were waiting, nobody even picked up the phone and nobody even came to check our order. This is the last time I will be ordering food from here. Never coming back here again!

---Restaurant Name: nonono---
City: New York City, New York
Review Stars: 1.0
Review Date: 2020-04-01

I got chicken thigh and gizzard yakitori as well as two boxes of yakisoba through caviar. I had high expectations but unfortunately the food was really bad. Both yakitori were very dried. The yakisoba did not have any flavor at all. I only gave one star because of the packaging as it was very well packed despite the food having a mediocre taste

Compound VADER Scores

Here, I compare compound scores before and after lockdown.

Return to Top: Sentiment Pre- and Post-Lockdown

In [39]:
plt.figure(figsize=(8,5))
plt.hist(pre_covid['compound'], alpha=0.5, color='palevioletred', bins=10, label='Pre-Lockdown', normed=True)
plt.hist(post_covid['compound'], alpha=0.5, color='mediumseagreen', bins=10, label='Post-Lockdown', normed=True)
plt.title('Distribution of Compound Scores')
plt.xlabel('Compound VADER Score')
plt.ylabel('Frequency')
plt.legend()
plt.tight_layout()
In [40]:
# plot distribution of compound scores by review stars  pre- and post-lockdown

fig, axes = plt.subplots(5, 1, figsize=(10, 15))

sns.distplot(pre_covid.loc[pre_covid['score']==1,'compound'], hist=False, norm_hist=True, color='palevioletred', label='1 Star Pre-Lockdown', ax=axes[0])
sns.distplot(post_covid.loc[post_covid['score']==1,'compound'], hist=False, norm_hist=True, color='mediumseagreen', label='1 Star Post-Lockdown', ax=axes[0])
sns.distplot(pre_covid.loc[pre_covid['score']==2,'compound'], hist=False, norm_hist=True, color='palevioletred', label='2 Stars Pre-Lockdown', ax=axes[1])
sns.distplot(post_covid.loc[post_covid['score']==2,'compound'], hist=False, norm_hist=True, color='mediumseagreen', label='2 Stars Post-Lockdown', ax=axes[1])
sns.distplot(pre_covid.loc[pre_covid['score']==3,'compound'], hist=False, norm_hist=True, color='palevioletred', label='3 Stars Pre-Lockdown', ax=axes[2])
sns.distplot(post_covid.loc[post_covid['score']==3,'compound'], hist=False, norm_hist=True, color='mediumseagreen', label='3 Stars Post-Lockdown', ax=axes[2])
sns.distplot(pre_covid.loc[pre_covid['score']==4,'compound'], hist=False, norm_hist=True, color='palevioletred', label='4 Stars Pre-Lockdown', ax=axes[3])
sns.distplot(post_covid.loc[post_covid['score']==4,'compound'], hist=False, norm_hist=True, color='mediumseagreen', label='4 Stars Post-Lockdown', ax=axes[3])
sns.distplot(pre_covid.loc[pre_covid['score']==5,'compound'], hist=False, norm_hist=True, color='palevioletred', label='5 Stars Pre-Lockdown' ,ax=axes[4])
sns.distplot(post_covid.loc[post_covid['score']==5,'compound'], hist=False, norm_hist=True, color='mediumseagreen', label='5 Stars Post-Lockdown' ,ax=axes[4])

axes[0].set_title('Compound VADER Score Pre- and Post-Lockdown by Review Stars')
for ax in range(5):
    axes[ax].set_xlabel('')
for ax in range(5):
    axes[ax].set_xlim(-1,1)

fig.add_subplot(111, frame_on=False)
plt.tick_params(labelcolor="none", bottom=False, left=False)
plt.ylabel('Probability Density')
plt.xlabel('Compound VADER Score')
plt.grid(False)

plt.tight_layout()
In [41]:
compound_volume = data=df_clean.groupby('Post-COVID Lockdown').count()['compound']

fig, axs = plt.subplots(2, figsize=(10,10), sharex=True)
sns.boxplot(x='Post-COVID Lockdown', y='compound', data=df_clean, 
            palette=['palevioletred', 'mediumseagreen'], showfliers=False, ax=axs[0])
sns.barplot(x=compound_volume.index, y=compound_volume, 
            palette=['palevioletred', 'mediumseagreen'], ax=axs[1], ec='k')

axs[0].set_title('Compound VADER Scores Pre- and Post-Lockdown')
axs[0].set_xlabel('')
axs[0].set_ylabel('Compound VADER Score')
axs[1].set_xlabel('')
axs[1].set_ylabel('Number of Reviews')

plt.xticks([0, 1], ['Pre-Lockdown', 'Post-Lockdown'])
plt.tight_layout()
In [42]:
# compare compound VADER score before and after lockdown
mean_diff, conf_int, p_low, p_high = bootstrap(pre_covid['compound'], post_covid['compound'])
if p_low < 0.05:
    print('The compound VADER score significantly decreased by a mean difference of {:.3f} in post-lockdown reviews; p={}'.format(np.abs(mean_diff), p_low))
elif p_high <0.05:    
    print('The compound VADER score significantly increased by a mean difference of {:.3f} in post-lockdown reviews; p={}'.format(np.abs(mean_diff), p_high))
else:
    print('No significant change in compound VADER score; p={}'.format(p_low))
The compound VADER score significantly decreased by a mean difference of 0.030 in post-lockdown reviews; p=0.0

The compound VADER score is significantly higher in post-lockdown reviews, indicating that sentiment has improved. To determine if this is due to an increase in positive sentiment or a decrease in negative sentiment, I evaluate the positive and negative VADER scores the same way.

Of note, the data are really imbalanced - there are substantially more reviews pre-lockdown. A future direction might be to select one random review pre-lockdown for every review post-lockdown for a given restaurant to create a more balanced dataset.

In [43]:
# highest compound scores post-lockdown
for idx, row in post_covid[post_covid['compound']==post_covid['compound'].max()].iterrows():
    print('\n---Restaurant Name: {}---'.format(row['name']))
    print('City: {}'.format(row['city_x']))
    print('Review Score: {}'.format(row['score']))
    print('Review Date: {}\n'.format(row['publish_date']))
    print(row['description'])
---Restaurant Name: Farmhouse---
City: Los Angeles, California
Review Score: 5.0
Review Date: 2020-03-26

It was a rainy day in Los Angeles, yes really! It was actually a gray, cold, drizzly day in the southland and after a lazy afternoon of wandering around the spacious and empty newly-renovated Beverly Center, we decided to pop into Farmhouse for a pre-dinner nosh and a cocktail. (Mind you the shopping center was filled with merchandise and light, the new skylights installed on the ceiling of the center bring in a LOT of natural light and makes the center seem like one is outdoors instead of inside. But sadly the center is devoid of shoppers or even of the dreaded lookie-loos, which in this case would be better than just standing around doing nothing for nobody. More on that in another post.)   

As a point of reference, my boyfriend, and I are big fans of happy hours and we are always on the look out to try a new bar or restaurant at an introductory price point. Although we both possess excellent kitchen skills and we enjoy our time preparing our meals, and despite the fact that we were headed home to prepare our dinner, we just wanted something now to tide us over as well as to give this place a try. 

Farmhouse opened in March 2018 so it is by no means the new kid on the block. But it was new to me as I very seldom get over the hill from Burbank (where we live) and since working in Hollywood, I've been far less inclined to venture back over the hill for dinners or brunch meetings. But, having just been laid off, hence the mid-week shopping trip, this Happy Hour was a welcome treat! 

We were fortunate enough to be served by Jake at the bar who was an expert at finding our drink tastes and suggested the perfect cocktails for us both. I am a lover of gin and was hence poured the Bee's Knees (gin, honey and lemon (amazing). My bf loves bourbon and we both love a good Old Fashioned... so he was mixed an incredible Maple Old Fashioned which is simply delicious with maple syrup, bourbon and those Angostura bitters!.  An incredible drink, true perfection. For some light bites we snacked on the Cauliflower Popcorn - which we could not get enough of! Light, airy, crispy and delicious! There was so much more we wanted to eat we were blown away and can't wait to go back for dinner. 

While we were there we were lucky enough to meet the owner and executive chef Nathan Peitso. He was unassuming, kind and helpful. We have met many owners and executive chefs over the years and my bf works for one of the largest restaurant groups in LA as a manager so he knows service and management. And we both admitted this owner was by far and above the coolest and nicest ones we'd met. He talked to us about the menu choices, the cool decor and the vibe of the place. All of which we felt was first rate and 5 stars all around. Great job Nathan! You have a built a terrific place. 

The dining room is filled with large, open tables and soft fabrics. Old farmhouse antiques dot the walls with copper pots and open shelves display gleaming china and glassware. It's an open air kitchen that allows everyone to see the organized ballet as they move swiftly between the pans, pots and serving stations. Seamless and swift! Quiet yet animated. A lovely show to behold and everything is clean, shiny and new. 

The long, L-shaped bar greets you as you enter and comfy bar stools line the perimeter. They even offer charging for your phone should you need it. My iPhone was at 3% and about to die and I asked if he could charge my phone. He said absolutely and I expected to have to hand it over to be charged behind the bar. Instead a cable was extended over to me and I could charge while I was seated in my stool enjoying my drink! Thank you Jake! Amazing! 

We can't wait to go back for dinner when they reopen post Pandemic. This place is a true find and takes the notion of farm to table to the next level. Whether you are going in for a small bite or a full course meal, they will take care of you, make you feel like home and feed you like you are family. 

I really can't say enough good about this place. I've held on to my receipt since we went with the intent of writing this review. And, now that we are under home isolation with the Covid-19 lock-down, I find myself with a little more time on my hands. 

I encourage EVERYONE to go and have a meal or at the very least drinks at FARMHOUSE! And, they offer FREE VALET while you eat there. Your first FOUR HOURS ARE FREE!!! Just pull into their valet entrances off Beverly Blvd. or La Cienega Blvd. and get a validation. How nice is that? 

And if you going to the bar, ask for a Bee's Knees or a Maple Old Fashioned and you will NOT be disappointed! Or order anything else they pour- it's all great! Enjoy and stay home, stay healthy and we'll see you all at the Farmhouse very soon.
In [44]:
# lowest compound scores post-lockdown
for idx, row in post_covid[post_covid['compound']==post_covid['compound'].min()].iterrows():
    print('\n---Restaurant Name: {}---'.format(row['name']))
    print('City: {}'.format(row['city_x']))
    print('Review Score: {}'.format(row['score']))
    print('Review Date: {}\n'.format(row['publish_date']))
    print(row['description'])
---Restaurant Name: The Cheesecake Factory---
City: Passaic, New Jersey
Review Score: 1.0
Review Date: 2020-04-18

Normally cheesecake is fine besides them forgetting cake or items on pick up which has been an issue but today is different. I placed an order online for curbside pickup, when you go onto the site it says something along the lines of "all orders will be curbside pick up and you will be texted when your order is ready" So I place my order and the message on my order as well as in email form says come inside to bakery counter, upon several calls to clarify during the pandemic no one answered as per their usual unprofessionalism/poor customer service, after a while I got a fax machine sound so I hung up. Upon arriving to a load of cars I go inside to way too many people probably also confused to what to do. I wait in a long line to get to the front and after 20 minutes they tell me they can't find the order, I tell them what it is and they have no recollection and tell me to tell them what the order is and have to remake it. By now I'm furious and after a while they finally tell me to wait in my car and they will bring it to me. During all of this one of the runners had no mask on and kept taking it off. This is piss poor service on a regular day but on take out only during a PANDEMIC not only did the staff, management and corporate all fail but they failed miserably to protect the lives of their patrons during this terribly hard time. 

I am an essential worker on the frontlines fighting corona so before you say it's hard on everyone blah blah I am there and way deeper into this than sitting in. Restaurant making mistakes that CAN and SHOULD be avoided to protect lives. This is not joke, not a game and these things can not happen while hundreds are losing their lives a day in this area, fix it or close down until after this is over so you can go back to screwing up orders, not now. 

Avoid this place like the plague AT LEAST until this is over, they have lost my business forever. Anyone familiar w me or my reviews knows i typically never leave a 1 Star or write something like this. Awful.

Overall, there is a decrease in sentiment post-lockdown; however, it is difficult to interpret this, since there are so many fewer reviews available in the current data,

Positive VADER Scores

Here, I compare positive scores before and after lockdown.

Return to Top: Sentiment Pre- and Post-Lockdown

In [45]:
# add artificial max to post-covid data so scale is the same
artificial_max = post_covid[post_covid['pos']==post_covid['pos'].max()]
artificial_max['pos'] = 1
post_covid_plot = post_covid.append(artificial_max)

plt.figure(figsize=(8,5))
plt.hist(pre_covid['pos'], alpha=0.5, color='palevioletred', bins=10, label='Pre-Lockdown', normed=True)
plt.hist(post_covid_plot['pos'], alpha=0.5, color='mediumseagreen', bins=10, label='Post-Lockdown', normed=True)
plt.title('Distribution of Positive Scores')
plt.xlabel('Positive VADER Score')
plt.ylabel('Frequency')
plt.legend()
plt.tight_layout()
In [46]:
# plot distribution of positive scores by review for pre- or post-lockdown

fig, axes = plt.subplots(5, 1, figsize=(10, 15))

sns.distplot(pre_covid.loc[pre_covid['score']==1,'pos'], hist=False, norm_hist=True, color='palevioletred', label='1 Star Pre-Lockdown', ax=axes[0])
sns.distplot(post_covid.loc[post_covid['score']==1,'pos'], hist=False, norm_hist=True, color='mediumseagreen', label='1 Star Post-Lockdown', ax=axes[0])
sns.distplot(pre_covid.loc[pre_covid['score']==2,'pos'], hist=False, norm_hist=True, color='palevioletred', label='2 Stars Pre-Lockdown', ax=axes[1])
sns.distplot(post_covid.loc[post_covid['score']==2,'pos'], hist=False, norm_hist=True, color='mediumseagreen', label='2 Stars Post-Lockdown', ax=axes[1])
sns.distplot(pre_covid.loc[pre_covid['score']==3,'pos'], hist=False, norm_hist=True, color='palevioletred', label='3 Stars Pre-Lockdown', ax=axes[2])
sns.distplot(post_covid.loc[post_covid['score']==3,'pos'], hist=False, norm_hist=True, color='mediumseagreen', label='3 Stars Post-Lockdown', ax=axes[2])
sns.distplot(pre_covid.loc[pre_covid['score']==4,'pos'], hist=False, norm_hist=True, color='palevioletred', label='4 Stars Pre-Lockdown', ax=axes[3])
sns.distplot(post_covid.loc[post_covid['score']==4,'pos'], hist=False, norm_hist=True, color='mediumseagreen', label='4 Stars Post-Lockdown', ax=axes[3])
sns.distplot(pre_covid.loc[pre_covid['score']==5,'pos'], hist=False, norm_hist=True, color='palevioletred', label='5 Stars Pre-Lockdown' ,ax=axes[4])
sns.distplot(post_covid.loc[post_covid['score']==5,'pos'], hist=False, norm_hist=True, color='mediumseagreen', label='5 Stars Post-Lockdown' ,ax=axes[4])

axes[0].set_title('Positive VADER Score Pre- and Post-Lockdown by Review Stars')
for ax in range(5):
    axes[ax].set_xlabel('')
for ax in range(5):
    axes[ax].set_xlim(0,1)

fig.add_subplot(111, frame_on=False)
plt.tick_params(labelcolor="none", bottom=False, left=False)
plt.ylabel('Probability Density')
plt.xlabel('Positive VADER Score')
plt.grid(False)

plt.tight_layout()
In [47]:
pos_volume = data=df_clean.groupby('Post-COVID Lockdown').count()['pos']

fig, axs = plt.subplots(2, figsize=(10,10), sharex=True)
sns.boxplot(x='Post-COVID Lockdown', y='pos', data=df_clean, 
            palette=['palevioletred', 'mediumseagreen'], showfliers=False, ax=axs[0])
sns.barplot(x=pos_volume.index, y=pos_volume, 
            palette=['palevioletred', 'mediumseagreen'], ax=axs[1], ec='k')

axs[0].set_title('Positive VADER Scores Pre- and Post-Lockdown')
axs[0].set_xlabel('')
axs[0].set_ylabel('Positive VADER Score')
axs[1].set_xlabel('')
axs[1].set_ylabel('Number of Reviews')

plt.xticks([0, 1], ['Pre-Lockdown', 'Post-Lockdown'])
plt.tight_layout()
In [48]:
mean_diff, conf_int, p_low, p_high = bootstrap(pre_covid['pos'], post_covid['pos'])
if p_low < 0.05:
    print('The positive VADER score significantly decreased by a mean difference of {:.3f} in post-lockdown reviews; p={}'.format(np.abs(mean_diff), p_low))
elif p_high <0.05:    
    print('The positive VADER score significantly increased by a mean difference of {:.3f} in post-lockdown reviews; p={}'.format(np.abs(mean_diff), p_high))
else:
    print('No significant change in positive VADER score; p={}'.format(p_low))
No significant change in positive VADER score; p=0.7416

Postive scores behave highly similarly before and after lockdown. We see a slight right-shift in the distribution towards more positive scores, though.

Negative VADER Scores

Here, I compare negative scores before and after lockdown.

Return to Top: Sentiment Pre- and Post-Lockdown

In [49]:
# add artificial max to pre- and post-covid data so scale is the same
artificial_max_post = post_covid[post_covid['neg']==post_covid['neg'].max()]
artificial_max_post['neg'] = 1
post_covid_plot = post_covid.append(artificial_max_post)

artificial_max_pre = pre_covid[pre_covid['neg']==pre_covid['neg'].max()]
artificial_max_pre['neg'] = 1
pre_covid_plot = post_covid.append(artificial_max_pre)

plt.figure(figsize=(8,5))
plt.hist(pre_covid_plot['neg'], alpha=0.5, color='palevioletred', bins=10, label='Pre-Lockdown', normed=True)
plt.hist(post_covid_plot['neg'], alpha=0.5, color='mediumseagreen', bins=10, label='Post-Lockdown', normed=True)
plt.title('Distribution of Negative Scores')
plt.xlabel('Negative VADER Score')
plt.ylabel('Frequency')
plt.legend()
plt.tight_layout()
In [50]:
# plot distribution of negative scores by review for pre- or post-lockdown

fig, axes = plt.subplots(5, 1, figsize=(10, 15))

sns.distplot(pre_covid.loc[pre_covid['score']==1,'neg'], hist=False, norm_hist=True, color='palevioletred', label='1 Star Pre-Lockdown', ax=axes[0])
sns.distplot(post_covid.loc[post_covid['score']==1,'neg'], hist=False, norm_hist=True, color='mediumseagreen', label='1 Star Post-Lockdown', ax=axes[0])
sns.distplot(pre_covid.loc[pre_covid['score']==2,'neg'], hist=False, norm_hist=True, color='palevioletred', label='2 Stars Pre-Lockdown', ax=axes[1])
sns.distplot(post_covid.loc[post_covid['score']==2,'neg'], hist=False, norm_hist=True, color='mediumseagreen', label='2 Stars Post-Lockdown', ax=axes[1])
sns.distplot(pre_covid.loc[pre_covid['score']==3,'neg'], hist=False, norm_hist=True, color='palevioletred', label='3 Stars Pre-Lockdown', ax=axes[2])
sns.distplot(post_covid.loc[post_covid['score']==3,'neg'], hist=False, norm_hist=True, color='mediumseagreen', label='3 Stars Post-Lockdown', ax=axes[2])
sns.distplot(pre_covid.loc[pre_covid['score']==4,'neg'], hist=False, norm_hist=True, color='palevioletred', label='4 Stars Pre-Lockdown', ax=axes[3])
sns.distplot(post_covid.loc[post_covid['score']==4,'neg'], hist=False, norm_hist=True, color='mediumseagreen', label='4 Stars Post-Lockdown', ax=axes[3])
sns.distplot(pre_covid.loc[pre_covid['score']==5,'neg'], hist=False, norm_hist=True, color='palevioletred', label='5 Stars Pre-Lockdown' ,ax=axes[4])
sns.distplot(post_covid.loc[post_covid['score']==5,'neg'], hist=False, norm_hist=True, color='mediumseagreen', label='5 Stars Post-Lockdown' ,ax=axes[4])

axes[0].set_title('Negative VADER Score Pre- and post-Lockdown by Review Stars')
for ax in range(5):
    axes[ax].set_xlabel('')
for ax in range(5):
    axes[ax].set_xlim(0,1)

fig.add_subplot(111, frame_on=False)
plt.tick_params(labelcolor="none", bottom=False, left=False)
plt.ylabel('Probability Density')
plt.xlabel('Negative VADER Score')
plt.grid(False)

plt.tight_layout()

Some of the spikes at 0 are due to the same issues we saw previously: reviews with no terms deemed "negative". We know the data are imbalanced overall, with fewer negative and neutral reviews than positive reviews, which could explain some of these spikes, since the y-axis is normalized.

In [51]:
neg_volume = data=df_clean.groupby('Post-COVID Lockdown').count()['neg']

fig, axs = plt.subplots(2, figsize=(10,10), sharex=True)
sns.boxplot(x='Post-COVID Lockdown', y='neg', data=df_clean, 
            palette=['palevioletred', 'mediumseagreen'], showfliers=False, ax=axs[0])
sns.barplot(x=neg_volume.index, y=neg_volume, 
            palette=['palevioletred', 'mediumseagreen'], ax=axs[1], ec='k')

axs[0].set_title('Negative VADER Scores Pre- and Post-Lockdown')
axs[0].set_xlabel('')
axs[0].set_ylabel('Negative VADER Score')
axs[1].set_xlabel('')
axs[1].set_ylabel('Number of Reviews')

plt.xticks([0, 1], ['Pre-Lockdown', 'Post-Lockdown'])
plt.tight_layout()
In [52]:
mean_diff, conf_int, p_low, p_high = bootstrap(pre_covid['neg'], post_covid['neg'])
if p_low < 0.05:
    print('The negative VADER score significantly decreased by a mean difference of {:.3f} in post-lockdown reviews; p={}'.format(np.abs(mean_diff), p_low))
elif p_high <0.05:    
    print('The negative VADER score significantly increased by a mean difference of {:.3f} in post-lockdown reviews; p={}'.format(np.abs(mean_diff), p_high))
else:
    print('No significant change in negative VADER score; p={}'.format(p_low))
The negative VADER score significantly increased by a mean difference of 0.004 in post-lockdown reviews; p=0.0

The patterns for negative scores again seem similar between the two time periods. The overall distribution of negative scores looks almost identical between the two time periods.

Sentiment Pre- and Post-Lockdown Conclusions

Overall, restaurant sentiment has increased post-lockdown due to a decrease in negative reviews; however, more data post-lockdown would really help solidify this.


Sentiment Pre- and Post-Lockdown by Region

Evaluate sentiment before and after lockdown for each county.

Return to Top: Sentiment Analysis

In [54]:
fig, axes = plt.subplots(13, 2, figsize=(10, 50))

# add one review with each star to each city before and after lockdown
# add one review with each star to each city before and after lockdown
add_city = []
add_star = []
for city in df_clean['city_x'].unique():
    for star in range(1,6):
        add_city.append(city)
        add_star.append(star)
add_stars = {'city_x':add_city, 'score':add_star}
post_stars = post_covid.append(pd.DataFrame(add_stars))
pre_stars = pre_covid.append(pd.DataFrame(add_stars))

for idx, city in enumerate(df_clean['city_x'].unique()):        
    
    loc_x = 0
    loc_y = 0
    if idx % 2 == 0:
        loc_x = int(idx/2)
        loc_y = 0
    else:
        loc_x = int((idx-1)/2)
        loc_y = 1
    sns.barplot(x=post_stars.loc[post_stars['city_x']==city, 'score'].value_counts(normalize=True).sort_index().index,
            y=post_stars.loc[post_stars['city_x']==city, 'score'].value_counts(normalize=True).sort_index(),
            alpha=0.5, color='mediumseagreen', label='Post-Lockdown',
            ax=axes[loc_x, loc_y])
    sns.barplot(x=pre_stars.loc[pre_stars['city_x']==city, 'score'].value_counts(normalize=True).sort_index().index,
                y=pre_stars.loc[pre_stars['city_x']==city, 'score'].value_counts(normalize=True).sort_index(),
                alpha=0.5, color='palevioletred', label='Pre-Lockdown',
                ax=axes[loc_x, loc_y])
    axes[loc_x, loc_y].legend()
    axes[loc_x, loc_y].set_title(city)
    axes[loc_x, loc_y].set_ylabel('Frequency')
    axes[loc_x, loc_y].set_xlabel('Review Stars')

plt.tight_layout()

In many cities, the proportion of 5-star reviews seems to increase and the proportion of 1-star reviews increases, indicating a decrease in neutral reviews. One city of note is Seattle, where the proportion of 1-star reviews decreased and the proportion of 5-star reviews dramatically increased, suggesting improved sentiment in the PNW.

Compound VADER Scores by Region

Here, I compare compound VADER scores pre- and post-lockdown by county.

Return to Top: Sentiment Pre- and Post-Lockdown by Region

In [55]:
compound_vol_county = df_clean.groupby(['city_x', 'Post-COVID Lockdown']).count()['compound'].reset_index()

fig, axs = plt.subplots(2, figsize=(20,15), sharex=True)
sns.boxplot(x='city_x', y='compound', hue='Post-COVID Lockdown', data=df_clean, 
            palette=['palevioletred', 'mediumseagreen'], showfliers=False, ax=axs[0])
sns.barplot(x='city_x', y='compound', hue='Post-COVID Lockdown', data=compound_vol_county, 
            palette=['palevioletred', 'mediumseagreen'], ax=axs[1], ec='k')

axs[0].set_title('Compound VADER Scores Pre- and Post-Lockdown by Region', size=25)
axs[0].set_xlabel('')
axs[0].set_ylabel('Compound VADER Score')
axs[1].set_xlabel('')
axs[1].set_ylabel('Number of Reviews')
axs[0].get_legend().remove()
plt.legend(title='Before (0) or After (1) Lockdown')

plt.xticks(rotation=90)
plt.tight_layout()
In [56]:
# compare compound scores for each county pre- and post-lockdown
compound_county = []
for city in tqdm(df_clean['city_x'].unique()):  
    mean_diff, conf_int, p_low, p_high = bootstrap(df_clean.loc[(df_clean['city_x']==city) & (df_clean['Post-COVID Lockdown']==0), 'compound'],
                                                   df_clean.loc[(df_clean['city_x']==city) & (df_clean['Post-COVID Lockdown']==1), 'compound'])
    print('------')
    print('City: {}'.format(city))
    if p_low < 0.05:
        compound_county.append(city)
        print('The compound VADER score significantly decreased after lockdown; mean difference=-{:.2f}, p={}'.format(np.abs(mean_diff), p_low))
    elif p_high < 0.05:
        compound_county.append(city)
        print('The compound VADER score significantly increased after lockdown; mean difference={:.2f}, p={}'.format(np.abs(mean_diff), p_high))
    else:
        print('No significant difference in compound VADER score; p={}'.format(p_low))
------
City: Middlesex, Massachusetts
No significant difference in compound VADER score; p=0.7193
------
City: Philadelphia, Pennsylvania
The compound VADER score significantly decreased after lockdown; mean difference=-0.06, p=0.0042
------
City: Hartford, Connecticut
The compound VADER score significantly decreased after lockdown; mean difference=-0.09, p=0.0119
------
City: Jefferson, Louisiana
The compound VADER score significantly decreased after lockdown; mean difference=-0.03, p=0.0475
------
City: Nassau, New York
No significant difference in compound VADER score; p=0.1418
------
City: Rockland, New York
No significant difference in compound VADER score; p=0.1135
------
City: Hudson, New Jersey
No significant difference in compound VADER score; p=0.092
------
City: Suffolk, New York
No significant difference in compound VADER score; p=0.2636
------
City: Macomb, Michigan
No significant difference in compound VADER score; p=0.3986
------
City: Westchester, New York
No significant difference in compound VADER score; p=0.125
------
City: Essex, New Jersey
The compound VADER score significantly decreased after lockdown; mean difference=-0.06, p=0.0117
------
City: New Haven, Connecticut
No significant difference in compound VADER score; p=0.4197
------
City: Oakland, Michigan
No significant difference in compound VADER score; p=0.5971
------
City: Middlesex, New Jersey
No significant difference in compound VADER score; p=0.1073
------
City: New York City, New York
The compound VADER score significantly decreased after lockdown; mean difference=-0.10, p=0.0
------
City: Fairfield, Connecticut
No significant difference in compound VADER score; p=0.4705
------
City: Los Angeles, California
The compound VADER score significantly decreased after lockdown; mean difference=-0.08, p=0.0
------
City: Morris, New Jersey
No significant difference in compound VADER score; p=0.2401
------
City: Orleans, Louisiana
No significant difference in compound VADER score; p=0.8271
------
City: Bergen, New Jersey
The compound VADER score significantly decreased after lockdown; mean difference=-0.06, p=0.0262
------
City: Passaic, New Jersey
No significant difference in compound VADER score; p=0.2635
------
City: Wayne, Michigan
No significant difference in compound VADER score; p=0.2142
------
City: Cook, Illinois
The compound VADER score significantly decreased after lockdown; mean difference=-0.07, p=0.0001
------
City: King, Washington
No significant difference in compound VADER score; p=0.732
------
City: Union, New Jersey
No significant difference in compound VADER score; p=0.0833

In [57]:
# reviews with lowest compound scores in regions with sig. differences
for city in compound_county:
    lowest_score = post_covid.loc[post_covid['city_x']==city, 'compound'].min()
    for idx, row in post_covid[(post_covid['city_x']==city) & (post_covid['compound']==lowest_score)].iterrows():
        print('\n---City: {}---'.format(row['city_x']))
        print('Restaurant Name: {}'.format(row['name']))
        print('Review Score: {}'.format(row['score']))
        print('Review Date: {}\n'.format(row['publish_date']))
        print(row['description'])
---City: Philadelphia, Pennsylvania---
Restaurant Name: Han Dynasty
Review Score: 2.0
Review Date: 2020-03-27

COVID TAKEOUT REVIEW
Businesses are definitely struggling during these times and it saddens me to write anyone a negative review. However, I ordered fried rice from this location last night and had extremely unpleasant food poisoning since. The rice tasted stale/old and definitely funny, but I ignored it and chalked it up to weird flavoring. Big mistake. When rice specifically  is kept too long, it tends to breed a specific bacteria (b cereus) resulting in very painful food poisoning. It must be very hard to keep a business going right now, but old food, ESPECIALLY rice is not acceptable. Very disappointed in last nights meal. Won't be ordering from here.

---City: Hartford, Connecticut---
Restaurant Name: Union Kitchen
Review Score: 1.0
Review Date: 2020-03-15

Take Note:  This is my own experience.  I hope yours is different.

Never made it into the restaurant.  An online reservation and review site took my dates' reservation for 'Super Bowl Sunday'.  Reminders were sent from the app on behalf of the restaurant.  Cals. populated.

Alas, the restaurant was closed.

GM apology appeared disingenuous and greatly impersonal to my date.  A 'gift card' sent was indeed, insulting in both the amount and that it was used as the mechanism to convey the apology for a technical miscue.

It was as if the GM could barely be bothered and the 'self-imposed' penalty for their faux-paux caused the restaurant no pain, suffered no consequence as did the couple whose entire evening was planned around this restaurant.

I do wonder if there was a restaurant full of online reservations for that day.

This may be my first 1  for this class of restaurant.
Based upon service.  Never got to eat there.

---City: Jefferson, Louisiana---
Restaurant Name: Gyu-Kaku Japanese BBQ
Review Score: 2.0
Review Date: 2020-03-15

I've been to Gyu-Kaku Japanese BBQ multiple times before and have enjoyed my visits. However, I went today and received such poor service that I felt the need to write about it. I was neglected by my server, who walked by our table multiple times without checking up on us or bothering to refill our water. I understand if the restaurant was busy and you couldn't come by, but the restaurant had a maximum of three tables (including mine). We waited a while to receive our check. Not only did we have to wait to grab our server's attention, but he also failed to give us the dessert that had come with our course (considering we ordered the Gyu-Kaku course for 2). At that point, I didn't want to bother reminding him about dessert because my table and I were ready to leave.

---City: Essex, New Jersey---
Restaurant Name: Cuban Pete's
Review Score: 1.0
Review Date: 2020-04-18

What horrible management! First time I ordered from here (on seamless) and it took almost two hrs! My order totaled almost $100 I understand the hardship right now with  the virus and everything happening which is why I wasn't rude but when I called to see what the issue was the person didn't even ask for my name to see how much longer my order would be. He was so nasty and just told me the driver needs to come in to get the food(when she already called me and told me she went in twice) when I called back to ask after the third time the driver went in the guy said the driver just went out "good luck" ....what the hell does that even mean! I will never order from here again. Fine the food took two hrs no prob but the guy with his rudeness was unacceptable.

---City: New York City, New York---
Restaurant Name: Naked Dog
Review Score: 1.0
Review Date: 2020-03-16

I almost never write Yelp reviews, but I had such a creepy experience at this place that I had to leave this one. 
I was a repeat customer at this place, and I always leave a 30% tip and treat all their staff with the utmost respect.
I made a reservation for 8:15 and got there at 9:15 pm. The host seemed like he had had a few drinks when I got there. He was very obnoxious and very argumentative, which was very weird.
I had gone to restaurants where they did not have a table available, and they usually try to accommodate customers the best they can.
Once it was clear that the host is just trying to pick a fight with a customer, we left.
On their Google profile, I left a review that said "Rude Host".
The following is the response they left at 3 am,

"How inconsiderate! You arrived ONE hour late to your reservation and arrogantly requested your table, which we had given away because you didn't even give us a call to say you were running late. We are in the middle of COVID-19 emergency and operating at 50% of our legal capacity as requested by authorities. Obviously we cannot keep a reservation for one hour when we have half of our tables available.
Next time, I suggest you take your (always different) dates to a different restaurant. You are no longer welcome here."

I kid you not. You can visit their Google profile and see that this is real.
How creepy is this?

His response to a repeat customer's concern about his unprofessional behavior is some nonsense about the virus. Why is the virus an excuse for your rude behavior? He just does not get what the service industry is all about.

He is basically saying: "Yes, not only I am more rude and obnoxious than you thought, but I am also publically hostile toward customers. If you don't like it don't come here."

Good advice. And I would suggest anyone reading this review to do the same and stay away from this hostile environment.

Additionally, I do see that they have had other reviews about the staff's rude behavior. You can rude to me, but I will not allow you to be rude to my girlfriend. You owe both of us a serious apology.

 If the owner tried to reach out to me, either publically or privately, and demonstrated that this was only one bad apple that has been dealt with, I would consider taking this review down, but otherwise, I want everyone to know that the staff is very hostile.

---City: Los Angeles, California---
Restaurant Name: Mee and Greet
Review Score: 1.0
Review Date: 2020-04-12

Just had the most horrendous chicken sandwich from this restaurant. The meat was stringy and nasty. And it looks like someone else had the same experience and then got food poisoning. I can't believe I just spent $17 to get sick.

---City: Bergen, New Jersey---
Restaurant Name: Aumm Aumm Pizzeria and Wine Bar at The Brownstone
Review Score: 1.0
Review Date: 2020-04-06

I HAVE FOOD POISONING FROM THIS PLACE. This is THE WORST PIZZA I have ever tried. It made me puke upon tasting it. It is so horrible I just ran down into the dumpster and threw it away. I HAVE NEVER tasted pizza so salty, putrid, and disgusting as this pizza that was just delivered. My stomach is still hurting from this and they charged me 30$ for this small disgusting pizza.

---City: Cook, Illinois---
Restaurant Name: The Cheesecake Factory
Review Score: 1.0
Review Date: 2020-04-13

I am a healthcare worker in from out of town working with Covid patients. I've been here for two weeks and I have seven weeks to go. This has got to be the absolute worst service and meal I've had so far. I called talk to the manager Tom. And he simply said too bad so sad nothing I can do. Never dying here again! Not only am I making this post but I will put this on social media and tell all of my colleagues that are stuck in this hotel.
In [58]:
restaurant_df.loc[restaurant_df['city_x'].isin(compound_county), 'Compound Affected'] = 1
restaurant_df['Compound Affected'] = restaurant_df['Compound Affected'].fillna(0)
In [59]:
data = go.Scattergeo(lon = restaurant_df['longitude'],
                     lat = restaurant_df['latitude'],
                     text = restaurant_df['city_x'],
                     mode = 'markers',
                     marker = dict(colorscale=['cornflowerblue', 'mediumorchid']),
                     marker_color = restaurant_df['Compound Affected'],
                    )
layout = dict(title = 'Regions with Significantly Affected Compound Scores Post-Lockdown (Purple)',
              geo_scope = 'usa')
choromap = go.Figure(data=[data], layout=layout)
iplot(choromap)
choromap.write_html('plotly_figures/restaurant_distribution_significant_compound.html')

There isn't a clear geographic trend of regions with significantly different compound scores post-lockdown; however, many regions do have significant differences, possibly indicating some have handled the trasition in business strategy better than others. We see significant decreases in sentiment in Philadelphia; Hartford; Jefferson, LA; Essex, NJ; NYC; LA; Bergen, NJ; and Cook, IL (Chicago).

In [60]:
pos_vol_county = df_clean.groupby(['city_x', 'Post-COVID Lockdown']).count()['pos'].reset_index()

fig, axs = plt.subplots(2, figsize=(20,15), sharex=True)
sns.boxplot(x='city_x', y='pos', hue='Post-COVID Lockdown', data=df_clean, 
            palette=['palevioletred', 'mediumseagreen'], showfliers=False, ax=axs[0])
sns.barplot(x='city_x', y='pos', hue='Post-COVID Lockdown', data=pos_vol_county, 
            palette=['palevioletred', 'mediumseagreen'], ax=axs[1], ec='k')

axs[0].set_title('Positive VADER Scores Pre- and Post-Lockdown by Region')
axs[0].set_xlabel('')
axs[0].set_ylabel('Positive VADER Score')
axs[1].set_xlabel('')
axs[1].set_ylabel('Number of Reviews')
axs[0].get_legend().remove()
plt.legend(title='Before (0) or After (1) Lockdown')

plt.xticks(rotation=90)
plt.tight_layout()
In [61]:
pos_county = []
for city in tqdm(df_clean['city_x'].unique()):  
    mean_diff, conf_int, p_low, p_high = bootstrap(df_clean.loc[(df_clean['city_x']==city) & (df_clean['Post-COVID Lockdown']==0), 'pos'],
                                                   df_clean.loc[(df_clean['city_x']==city) & (df_clean['Post-COVID Lockdown']==1), 'pos'])
    print('------')
    print('City: {}'.format(city))
    if p_low < 0.05:
        pos_county.append(city)
        print('The positive VADER score significantly decreased after lockdown; mean difference=-{:.2f}, p={}'.format(np.abs(mean_diff), p_low))
    elif p_high < 0.05:
        pos_county.append(city)
        print('The positive VADER score significantly increased after lockdown; mean difference={:.2f}, p={}'.format(np.abs(mean_diff), p_high))
    else:
        print('No significant difference in positive VADER score; p={}'.format(p_low))
------
City: Middlesex, Massachusetts
No significant difference in positive VADER score; p=0.8725
------
City: Philadelphia, Pennsylvania
No significant difference in positive VADER score; p=0.7768
------
City: Hartford, Connecticut
No significant difference in positive VADER score; p=0.1568
------
City: Jefferson, Louisiana
No significant difference in positive VADER score; p=0.1406
------
City: Nassau, New York
No significant difference in positive VADER score; p=0.5335
------
City: Rockland, New York
No significant difference in positive VADER score; p=0.4776
------
City: Hudson, New Jersey
No significant difference in positive VADER score; p=0.7363
------
City: Suffolk, New York
No significant difference in positive VADER score; p=0.3628
------
City: Macomb, Michigan
No significant difference in positive VADER score; p=0.6036
------
City: Westchester, New York
No significant difference in positive VADER score; p=0.1188
------
City: Essex, New Jersey
No significant difference in positive VADER score; p=0.7923
------
City: New Haven, Connecticut
No significant difference in positive VADER score; p=0.8598
------
City: Oakland, Michigan
No significant difference in positive VADER score; p=0.4418
------
City: Middlesex, New Jersey
No significant difference in positive VADER score; p=0.4946
------
City: New York City, New York
No significant difference in positive VADER score; p=0.3609
------
City: Fairfield, Connecticut
No significant difference in positive VADER score; p=0.7639
------
City: Los Angeles, California
The positive VADER score significantly increased after lockdown; mean difference=0.01, p=0.0125
------
City: Morris, New Jersey
No significant difference in positive VADER score; p=0.6035
------
City: Orleans, Louisiana
The positive VADER score significantly decreased after lockdown; mean difference=-0.04, p=0.04
------
City: Bergen, New Jersey
No significant difference in positive VADER score; p=0.318
------
City: Passaic, New Jersey
No significant difference in positive VADER score; p=0.7738
------
City: Wayne, Michigan
No significant difference in positive VADER score; p=0.7174
------
City: Cook, Illinois
No significant difference in positive VADER score; p=0.0997
------
City: King, Washington
The positive VADER score significantly increased after lockdown; mean difference=0.07, p=0.0081
------
City: Union, New Jersey
No significant difference in positive VADER score; p=0.8513

In [62]:
# reviews with lowest positive scores in regions with sig. differences
for city in pos_county:
    lowest_score = post_covid.loc[post_covid['city_x']==city, 'pos'].min()
    for idx, row in post_covid[(post_covid['city_x']==city) & (post_covid['pos']==lowest_score)].iterrows():
        print('\n---City: {}---'.format(row['city_x']))
        print('Restaurant Name: {}'.format(row['name']))
        print('Review Score: {}'.format(row['score']))
        print('Review Date: {}\n'.format(row['publish_date']))
        print(row['description'])
---City: Los Angeles, California---
Restaurant Name: Burgers Never Say Die
Review Score: 1.0
Review Date: 2020-04-18

Worst customer service, staff is beyond rude, disrespectful and unhelpful. We waited 30+ minutes and never even got our burgers.

---City: Los Angeles, California---
Restaurant Name: Dave's Hot Chicken
Review Score: 3.0
Review Date: 2020-03-31

Ordered the #2 combo on medium heat. 
This place is all misguided hype. It tasted very bland and the tender was dry in the slider. The medium heat level didn't really have any heat. Maybe that's where I messed up but it didn't have much flavor and was dry. Howlin Rays is much better than this place, I won't be back.

---City: Los Angeles, California---
Restaurant Name: Masa of Echo Park
Review Score: 5.0
Review Date: 2020-03-25

Grate service so far. Took my boys for pizza and dessert. Can't wait to dine in once again.

---City: Los Angeles, California---
Restaurant Name: BCD Tofu House
Review Score: 4.0
Review Date: 2020-04-08

Oh bcd, you never fail me. I ordered ubereats galbi, kimchi tofu soup and japchae. Sooooooo goood!!! A lot of Korean restaurants cause the runs but bcd is one of the only spots that will not cause that. Another plus side to this place is, it's 24 hours!!! Who doesn't crave food during after hours? Can't wait to go back again after quarantine.

---City: Los Angeles, California---
Restaurant Name: Park's BBQ
Review Score: 4.0
Review Date: 2020-03-28

My wife and I usually order the "galbitang" for delivery when we're craving meat and soup. The chucks of meat are plenty, and it usually feeds each one of us 2 meals. 4 stars because at times, the soup is very fatty and you can see the white deposits of fat when the soup gets cold.

---City: Los Angeles, California---
Restaurant Name: Maggiano's Little Italy
Review Score: 1.0
Review Date: 2020-03-29

Worst pasta I've ever had. Food poisoning after eating this horrible pasta. It was very crowded, noisy, and smelly.

---City: Los Angeles, California---
Restaurant Name: Guisados DTLA
Review Score: 4.0
Review Date: 2020-03-20

Authentic Mexican tacos.But they are very small and priced at 3.50$.Good food but overpriced.

---City: Los Angeles, California---
Restaurant Name: Carousel Restaurant
Review Score: 1.0
Review Date: 2020-03-24

In this difficult times, when staying home but not able to cook, decided to order for takeout. While waiting, watched people in and out of the restaurant. My order is delaying, but no complain, might not have their regular staff. One of the waiters, whom I have seen before When dining in, with gloves and face mask  walking out with a bag for delivery. I go out and see the man walking up and then crossing the street. Few minutes later he comes back, enters the kitchen still with the same gloves and face mask.  Oh my..

---City: Los Angeles, California---
Restaurant Name: Yang Chow
Review Score: 5.0
Review Date: 2020-03-16

Every time we are in court, the firm always stops by here for lunch. Slippery shrimp is due for!!!

---City: Los Angeles, California---
Restaurant Name: Dave's Hot Chicken
Review Score: 4.0
Review Date: 2020-03-22

I prefer going to this location over the one on Western because the line here is always shorter and significantly faster than the other restaurant's. There's also a nearby parking structure and lots of street parking available around this location. 

I usually order Combo 3 (Tender + Slider + Fries) when I come here and I have never been able to finish it. The food here really fills you up so I think the price point is reasonable. 

The reason I'm docking a star from this review is because the food does take a long time to come out. But again, this restaurant is still faster than the one on Western. So when I'm craving to eat at Dave's, it'll be this location.

---City: Los Angeles, California---
Restaurant Name: Dave's Hot Chicken
Review Score: 5.0
Review Date: 2020-04-12

Not ire on other reviews that say the chicken is dry. This food was tasty AF.

---City: Los Angeles, California---
Restaurant Name: Dave's Hot Chicken
Review Score: 3.0
Review Date: 2020-04-03

Ordered through Postmates and they forgot to add an extra slider resulting in one person not eating...and isn't it one sauce per slider? We only got one sauce total. This is frustrating. Please pay closer attention next time.

---City: Los Angeles, California---
Restaurant Name: TGI Korean BBQ
Review Score: 5.0
Review Date: 2020-03-16

It was a Friday night so this place was packed but the servers were able to attend to our needs quickly. Food was 8.5/10!

---City: Los Angeles, California---
Restaurant Name: The Oinkster
Review Score: 1.0
Review Date: 2020-04-17

Call in.  Place the order. Go there to p/u. 
All wrong. Reciept does not match the order. 
How fkn difficult? One job.  Flip burgers and collect. 
Guess it's too damn difficult.

---City: Los Angeles, California---
Restaurant Name: HATCH Yakitori + Bar
Review Score: 5.0
Review Date: 2020-04-18

=== Pandemic Pickup ===

I'm missing flame / smoke cooking. My building's outdoor grill is banned for use during the lockdown.

Hatch has re-opened after a brief closure and now offers signature bentos. The Schwarzen Egger contains a selection of skewers that hit all the classic yakitori flavors with whiffs of smokey char.

---City: Los Angeles, California---
Restaurant Name: Birdies
Review Score: 1.0
Review Date: 2020-04-04

Coronavirus ?!?! They forgot an item in our order. Rude owner blames it in the uber eats driver, said they've been eating customers food out of the bag. During the coronavirus epidemic,  birdies is the only restaurant in los angeles not sealing their togo bags. Containers were open. Disgusting. open bags. Seal your takeout bags, CORONAVIRUS !!!!

---City: Los Angeles, California---
Restaurant Name: Roscoe's House of Chicken & Waffles
Review Score: 3.0
Review Date: 2020-04-05

Ordered through grub hub and ordered cornbread, was given a dry ass biscuit in its place for the same price, should have not charged me if they didn't have the item!

---City: Los Angeles, California---
Restaurant Name: Sqirl
Review Score: 1.0
Review Date: 2020-03-29

Staff is trash!!

This place is overrate &amp; horrible customer service!
They always mess up my orders.
They can't never get it right.
Too many issues, don't know why people go back.

---City: Los Angeles, California---
Restaurant Name: Chadolpoong
Review Score: 4.0
Review Date: 2020-03-17

I only go here for their budae jjigae so I don't know how the other menu items are. But both times I've been here the food was decent, and I would suggest this place for dinner spots.

---City: Los Angeles, California---
Restaurant Name: Republique
Review Score: 2.0
Review Date: 2020-04-17

Over hyped and just mediocre experiences I've had for dinner and lunch.  Go to Auburn who's chef trained at Republique!!

---City: Orleans, Louisiana---
Restaurant Name: Two Sisters 'N Da East
Review Score: 2.0
Review Date: 2020-03-23

The food ain't the same since moving to the east it doesn't taste the same. The potato salad is nasty the portions are small.

---City: King, Washington---
Restaurant Name: Domino's Pizza
Review Score: 1.0
Review Date: 2020-04-15

When you order cinnamon sticks and they don't tell you they stop giving frosting with them. Then when you call in some kid Jason handles the situation with you should of known tough shit attitude.
In [63]:
# reviews with highest positive scores in regions with sig. differences
for city in pos_county:
    highest_score = post_covid.loc[post_covid['city_x']==city, 'pos'].max()
    for idx, row in post_covid[(post_covid['city_x']==city) & (post_covid['pos']==highest_score)].iterrows():
        print('\n---City: {}---'.format(row['city_x']))
        print('Restaurant Name: {}'.format(row['name']))
        print('Review Score: {}'.format(row['score']))
        print('Review Date: {}\n'.format(row['publish_date']))
        print(row['description'])
---City: Los Angeles, California---
Restaurant Name: Birdies
Review Score: 5.0
Review Date: 2020-03-27

All I can say is Chanel is the best. Thank you for your hospitality! Donuts were super yummy!

---City: Orleans, Louisiana---
Restaurant Name: Castnet Seafood
Review Score: 5.0
Review Date: 2020-03-19

Looking for a good place to buy LIVE crawfish?!? This is the spot! Crawfish comes already clean for you. LIVE And CLEAN! What could be better than that? This is my go to spot when I'm doing a boil for any gathering at my house.

---City: King, Washington---
Restaurant Name: Huxdotter Coffee
Review Score: 5.0
Review Date: 2020-04-03

Awesome place! Great baked goods and the people are very friendly. 
Definitely try it out!
In [64]:
restaurant_df.loc[restaurant_df['city_x'].isin(pos_county), 'Positive Affected'] = 1
restaurant_df['Positive Affected'] = restaurant_df['Positive Affected'].fillna(0)

data = go.Scattergeo(lon = restaurant_df['longitude'],
                     lat = restaurant_df['latitude'],
                     text = restaurant_df['city_x'],
                     mode = 'markers',
                     marker = dict(colorscale=['cornflowerblue', 'mediumorchid']),
                     marker_color = restaurant_df['Positive Affected'],
                    )
layout = dict(title = 'Regions with Significantly Affected Positive Scores Post-Lockdown (Purple)',
              geo_scope = 'usa')
choromap = go.Figure(data=[data], layout=layout)
iplot(choromap)
choromap.write_html('plotly_figures/restaurant_distribution_significant_positive.html')

Again, there isn't a clear geographic trend of regions with significantly different positive scores post-lockdown. The only region with a significantly decreased positive score is New Orleans; LA and Seattle both showed a significant increase in positive scores.

Negative VADER Scores by Region

Here, I examine negative VADER scores pre- and post-lockdown by region.

Return to Top: Sentiment Pre- and Post-Lockdown by Region

In [65]:
neg_vol_county = df_clean.groupby(['city_x', 'Post-COVID Lockdown']).count()['neg'].reset_index()

fig, axs = plt.subplots(2, figsize=(20,15), sharex=True)
sns.boxplot(x='city_x', y='neg', hue='Post-COVID Lockdown', data=df_clean, 
            palette=['palevioletred', 'mediumseagreen'], showfliers=False, ax=axs[0])
sns.barplot(x='city_x', y='neg', hue='Post-COVID Lockdown', data=neg_vol_county, 
            palette=['palevioletred', 'mediumseagreen'], ax=axs[1], ec='k')

axs[0].set_title('Negative VADER Scores Pre- and Post-Lockdown by Region')
axs[0].set_xlabel('')
axs[0].set_ylabel('Negative VADER Score')
axs[1].set_xlabel('')
axs[1].set_ylabel('Number of Reviews')
axs[0].get_legend().remove()
plt.legend(title='Before (0) or After (1) Lockdown')

plt.xticks(rotation=90)
plt.tight_layout()
In [66]:
neg_county = []
for city in tqdm(df_clean['city_x'].unique()):  
    mean_diff, conf_int, p_low, p_high = bootstrap(df_clean.loc[(df_clean['city_x']==city) & (df_clean['Post-COVID Lockdown']==0), 'neg'],
                                                   df_clean.loc[(df_clean['city_x']==city) & (df_clean['Post-COVID Lockdown']==1), 'neg'])
    print('------')
    print('City: {}'.format(city))
    if p_low < 0.05:
        neg_county.append(city)
        print('The negative VADER score significantly decreased after lockdown; mean difference=-{:.2f}, p={}'.format(np.abs(mean_diff), p_low))
    elif p_high < 0.05:
        neg_county.append(city)
        print('The negative VADER score significantly increased after lockdown; mean difference={:.2f}, p={}'.format(np.abs(mean_diff), p_high))
    else:
        print('No significant difference in negative VADER score; p={}'.format(p_low))
------
City: Middlesex, Massachusetts
No significant difference in negative VADER score; p=0.423
------
City: Philadelphia, Pennsylvania
No significant difference in negative VADER score; p=0.5787
------
City: Hartford, Connecticut
The negative VADER score significantly increased after lockdown; mean difference=0.02, p=0.0032
------
City: Jefferson, Louisiana
The negative VADER score significantly increased after lockdown; mean difference=0.01, p=0.0101
------
City: Nassau, New York
The negative VADER score significantly increased after lockdown; mean difference=0.05, p=0.036
------
City: Rockland, New York
No significant difference in negative VADER score; p=0.7423
------
City: Hudson, New Jersey
No significant difference in negative VADER score; p=0.633
------
City: Suffolk, New York
The negative VADER score significantly increased after lockdown; mean difference=0.03, p=0.0403
------
City: Macomb, Michigan
No significant difference in negative VADER score; p=0.929
------
City: Westchester, New York
No significant difference in negative VADER score; p=0.8552
------
City: Essex, New Jersey
No significant difference in negative VADER score; p=0.7192
------
City: New Haven, Connecticut
No significant difference in negative VADER score; p=0.3845
------
City: Oakland, Michigan
No significant difference in negative VADER score; p=0.1754
------
City: Middlesex, New Jersey
No significant difference in negative VADER score; p=0.6652
------
City: New York City, New York
The negative VADER score significantly increased after lockdown; mean difference=0.02, p=0.0002
------
City: Fairfield, Connecticut
No significant difference in negative VADER score; p=0.2039
------
City: Los Angeles, California
The negative VADER score significantly increased after lockdown; mean difference=0.01, p=0.0005
------
City: Morris, New Jersey
No significant difference in negative VADER score; p=0.2213
------
City: Orleans, Louisiana
No significant difference in negative VADER score; p=0.3926
------
City: Bergen, New Jersey
No significant difference in negative VADER score; p=0.9253
------
City: Passaic, New Jersey
No significant difference in negative VADER score; p=0.6845
------
City: Wayne, Michigan
No significant difference in negative VADER score; p=0.7871
------
City: Cook, Illinois
The negative VADER score significantly increased after lockdown; mean difference=0.01, p=0.02
------
City: King, Washington
No significant difference in negative VADER score; p=0.5518
------
City: Union, New Jersey
No significant difference in negative VADER score; p=0.8464

In [67]:
# reviews with highest negative scores in regions with sig. differences
for city in neg_county:
    lowest_score = post_covid.loc[post_covid['city_x']==city, 'neg'].max()
    for idx, row in post_covid[(post_covid['city_x']==city) & (post_covid['neg']==lowest_score)].iterrows():
        print('\n---City: {}---'.format(row['city_x']))
        print('Restaurant Name: {}'.format(row['name']))
        print('Review Score: {}'.format(row['score']))
        print('Review Date: {}\n'.format(row['publish_date']))
        print(row['description'])
---City: Hartford, Connecticut---
Restaurant Name: Villa Of Lebanon
Review Score: 5.0
Review Date: 2020-04-09

When it comes to authentic northern Lebanese cuisine no other restaurant in CT can come close to this in fancy modest but delicious food.
Step in and do t be fooled by the appearance as you will lose yourself with every bite and forget you are in Hartford county .
Give it a try you won't be disappointed.

---City: Jefferson, Louisiana---
Restaurant Name: Bevi Seafood
Review Score: 1.0
Review Date: 2020-04-04

Just went to Bevi for crawfish. 3 cashiers ignored me while I was in line in front of the register. I left. F-y'all crawfish. It ain't even that good. I'm not paying for horrible service.

---City: Nassau, New York---
Restaurant Name: Filli's Deli & Bakery
Review Score: 1.0
Review Date: 2020-04-08

This place sucks. The employees are rude and the food is nothing special. They don't understand the concept of customer service or basic kindness. Portions are small. Take your business elsewhere.

---City: Suffolk, New York---
Restaurant Name: Panera Bread
Review Score: 1.0
Review Date: 2020-04-14

My green tea lemonade was bad and the woman at the other end of the line  told me I was wrong. The drink smelled of a sour fruity alcohol. They DID give me a refund, AFTER they told me that I was wrong. Unfortunately, my go to Panera is not one that I will be able to visit anymore. I'm quite upset that they needed to call me a liar about the issue. I was told it was just the pulp from the lemons, clearly you can see that is not the case here. Lemon pulp is not a creamy white substance. I'm beyond repulsed that I have ingested this stuff unknowingly. Here is a pic to show you the black clump and white mold on the bottom of the cup of this drink. I pray I don't get sick in the time of COVID-19 from a spoiled drink. I'm really really disappointed.

---City: New York City, New York---
Restaurant Name: Lucali
Review Score: 1.0
Review Date: 2020-03-16

I have never been more mad in my life. Waiting 5 hours just to be served by the rudest staff I have ever met. On top of that, they make you pay 20% gratuity on a 4 top for the worst service ever. The arrogance of this overrate tourist trap is beyond reason.

---City: Los Angeles, California---
Restaurant Name: Maggiano's Little Italy
Review Score: 1.0
Review Date: 2020-03-29

Worst pasta I've ever had. Food poisoning after eating this horrible pasta. It was very crowded, noisy, and smelly.

---City: Cook, Illinois---
Restaurant Name: Gibsons Bar & Steakhouse
Review Score: 1.0
Review Date: 2020-03-25

Terrible terrible experience. I ordered two filet steaks, medium, and I got raw, uncooked meat. Very frustrated and will never go to a Gibson's again. How does this pass as medium cooked?
In [68]:
restaurant_df.loc[restaurant_df['city_x'].isin(neg_county), 'Negative Affected'] = 1
restaurant_df['Negative Affected'] = restaurant_df['Negative Affected'].fillna(0)

data = go.Scattergeo(lon = restaurant_df['longitude'],
                     lat = restaurant_df['latitude'],
                     text = restaurant_df['city_x'],
                     mode = 'markers',
                     marker = dict(colorscale=['cornflowerblue', 'mediumorchid']),
                     marker_color = restaurant_df['Negative Affected'],
                    )
layout = dict(title = 'Regions with Significantly Affected Negative Scores Post-Lockdown (Purple)',
              geo_scope = 'usa')
choromap = go.Figure(data=[data], layout=layout)
iplot(choromap)
choromap.write_html('plotly_figures/restaurant_distribution_significant_negative.html')

A number of counties experience a significant increase in negative scores (i.e., worse sentiment), including Hartford; Jefferson, LA; Nassau, NJ; Suffolk, NY; NYC; LA; and Cook, IL.


Sentiment Analysis Conclusions

While it's difficult to draw broad conclusions with a limited sample size, in general, there is a decrease in sentiment post-lockdown.

Sentiment is variable on a by-restaurant basis, and there are no clear overall geographical trends. This could be studied further with more restaurant data.

Examining all reviews together, there appears to be a decrease in sentiment post-lockdown due to an increase in negative reviews; however, the sample size post-lockdown is very small. At the time of scraping, only a month of reviews were available; a repeat analysis with more recent would be more informative.

Regionally, while some counties don't have clear trends, many also show a decrease in sentiment, also due to an increase in negative reviews. Highly cornavirus-affected areas such as New York City, LA, and adjacent counties are included in this list.

Return to Top: Sentiment Analysis


Modeling of Reviews

  a. Additional Data Cleaning
  b. TF-IDF Modeling
  c. KMeans Clustering
  d. PCA
  e. Topic Modeling
  f. Coronavirus-Like Review Sentiment
        Sentiment of the Coronavirus Cluster
        Comparison of Sentiment by Cluster
        Sample Reviews for the Coronavirus-Like Cluster
        Geographical Distribution of Coronavirus-Like Reviews

  g. Modeling Conclusions

Return to Contents


Additional Data Cleaning

VADER requires some text components to remain intact, such as capitalization of words and punctuation, since these can influence sentiment. However. for the remainder of the analysis, we need to further clean the review text.

In [69]:
nlp = spacy.load('en')

def sentiment_tokenizer_complete(review):
    # remove new lines
    review = review.replace('\n', ' ')
    
    mytokens = nlp(review)
    
    # lemmatize; lowercase; remove spaces, punctuation, and numbers
    mytokens = [word.lemma_.lower() if word.lemma_ != '-PRON-' else word.lower_ for word in mytokens 
                if word.is_space==False and 
                word.is_punct==False and
                word.pos_!='NUM']
    
    # remove stop words
    mytokens = [word for word in mytokens if word not in STOP_WORDS]
    
    # remove words <=1 letter
    mytokens = [word for word in mytokens if len(word)>1]
        
    # join tokens back into a sentence
    clean_review = ' '.join(mytokens)
    
    return clean_review
In [70]:
%time df_clean['clean_review'] = df_clean['description'].apply(sentiment_tokenizer_complete)
CPU times: user 2h 10min 50s, sys: 5min 44s, total: 2h 16min 35s
Wall time: 1h 34min 39s
In [71]:
# checkpoint  in case the kernel dies
df_clean.to_csv('checkpoint2.csv', index=False)
In [72]:
# in case the kernel dies
df_clean = pd.read_csv('checkpoint2.csv')

TF-IDF Modeling

Here, I use TF-IDF to vectorize the reviews for further analyses.

For TF-IDF modeling, I've limited the sample size to reviews only since March 1, 2020. All data ends up using >12GB of memory and causing the kernel on my computer to crash; one way around this would be running the clustering on a remote server.

Return to Top: Modeling of Reviews

In [73]:
# shape prior to reducing size
df_clean.shape
Out[73]:
(320392, 25)
In [74]:
# only use reviews since march 1 to reduce size
df_clean['publish_date'] = pd.to_datetime(df_clean['publish_date'])
df_short = df_clean[df_clean['publish_date']>='2020-03-01']
df_short.shape
Out[74]:
(11267, 25)
In [75]:
# add tf-idfs columns
tfidf = TfidfVectorizer(min_df = 10)
tfidf_result = tfidf.fit_transform(df_short['clean_review']).toarray()
tfidf_df = pd.DataFrame(tfidf_result, columns = tfidf.get_feature_names())
tfidf_df.columns = ["word_" + str(x) for x in tfidf_df.columns]
tfidf_df.index = df_short.index
tfidf_df.shape
Out[75]:
(11267, 3439)

KMeans Clustering

Using the elbow method, I test different values of k to determine how many clusters should be used with the TF-IDF data.

Return to Top: Modeling of Reviews

In [76]:
# test different values of k
ks = range(2,35)
inertias = []

for k in ks:
    model = KMeans(n_clusters=k)
    model.fit(tfidf_df)
    inertias.append(model.inertia_)
In [77]:
plt.figure(figsize=(15,5))
plt.plot(ks, inertias, '-o')
plt.xlabel('Number of clusters, k')
plt.ylabel('Inertia')
plt.xticks(ks)
plt.show()

There isn't a clear elbow, although at 13 there is a slight plateau.

In [78]:
# model to assign values - going with 13 clusters
kmeans = KMeans(n_clusters=13)
y_pred = kmeans.fit_predict(tfidf_df)
In [79]:
# add cluster assignment to the tfidf dataframe
tfidf_df['cluster'] = y_pred

plt.figure(figsize=(15,5))
plt.bar([i for i in range(1, 14)], tfidf_df['cluster'].value_counts(sort=False), ec='k')
plt.xticks([i for i in range(1, 14)])
plt.xlabel('Cluster')
plt.ylabel('Number of Reviews')
Out[79]:
Text(0, 0.5, 'Number of Reviews')

Most reviews are in the 13th cluster. Next, I've performed a PCA to visualize the clusters.


Principal Component Analysis

I use a PCA to reduce the dimensions of the TF-IDF data and visualize the clusters on a 2-D plot.

Return to Top: Modeling of Reviews

In [80]:
# pca to visualize clusters
pca = PCA(n_components=2)

# Apply the fit_transform method of model to grains: pca_features
pca_features = pca.fit_transform(tfidf_df.drop(columns='cluster'))

# some columns to append to
df_pca = tfidf_df[['cluster']]
In [81]:
# Assign 0th column of pca_features: xs
df_pca['x'] = pca_features[:,0]

# Assign 1st column of pca_features: ys
df_pca['y'] = pca_features[:,1]

# Scatter plot of first and second component of PCA
plt.figure(figsize=(10,10))
for cluster in range(13):
    plt.scatter(df_pca.loc[df_pca['cluster']==cluster, 'x'], df_pca.loc[df_pca['cluster']==cluster, 'y'], label=cluster, alpha=0.5)
plt.plot()
plt.legend()
plt.title('PCA of TF-IDF')
plt.show()
In [82]:
import plotly.graph_objects as go
import numpy as np

fig = go.Figure()


for cluster in range(13):
    fig.add_trace(go.Scatter(x=df_pca.loc[df_pca['cluster']==cluster, 'x'],
                             y=df_pca.loc[df_pca['cluster']==cluster, 'y'],
                             name=cluster,
                             mode='markers'))


# Set options common to all traces with fig.update_traces
fig.update_traces(mode='markers', marker_line_width=2, marker_size=10)
fig.update_layout(title='Interactive PCA',
                  yaxis_zeroline=False, xaxis_zeroline=False)

fig.write_html('plotly_figures/interactive_pca.html')
fig.show()

Based on these visualizations, we can see how the clusters might be seperate, although there appears to be overlap. This makes sense: similar words are likely used in all restaurant reviews, such as "restaurant" or "service" and could appear in many reviews.

In [83]:
# get mean for each vector by cluster
tfidf_means = tfidf_df.groupby('cluster').mean()
top_feats = {}
for cluster in range(13):
    top_feats[cluster] = tfidf_means.iloc[cluster].sort_values(ascending=False).head(5).index.to_list()
    print('Cluster: {}'.format(cluster))
    print([feat.split('word_')[1] for feat in top_feats[cluster]])
    print('---')
Cluster: 0
['good', 'food', 'place', 'like', 'restaurant']
---
Cluster: 1
['burger', 'fry', 'good', 'great', 'place']
---
Cluster: 2
['pizza', 'good', 'crust', 'place', 'pie']
---
Cluster: 3
['order', 'food', 'delivery', 'time', 'pick']
---
Cluster: 4
['staff', 'friendly', 'food', 'great', 'good']
---
Cluster: 5
['noodle', 'soup', 'thai', 'raman', 'dumpling']
---
Cluster: 6
['wait', 'come', 'table', 'time', 'food']
---
Cluster: 7
['love', 'food', 'place', 'amazing', 'good']
---
Cluster: 8
['recommend', 'highly', 'food', 'place', 'great']
---
Cluster: 9
['chicken', 'fry', 'good', 'sandwich', 'order']
---
Cluster: 10
['taco', 'good', 'order', 'mexican', 'food']
---
Cluster: 11
['sushi', 'roll', 'good', 'fresh', 'place']
---
Cluster: 12
['great', 'food', 'service', 'good', 'place']
---

Topic Modeling

Here, I use the clusters to identify terms and sentiment of terms associated with COVID-19.

Return to Top: Modeling of Reviews

In [84]:
# see which clusters seem to have the largest values for coronavirus words
coronavirus_words = ['covid', 'coronavirus', 'corona', 'covid-19', 'virus', 'pandemic', 'quarantine']
for word in coronavirus_words:
    if 'word_{}'.format(word) in tfidf_means.columns:
        max_value = tfidf_means['word_{}'.format(word)].max()
        max_cluster = tfidf_means[tfidf_means['word_{}'.format(word)]==max_value].index.to_list()

        print('"{}": Cluster {}'.format(word.capitalize(), max_cluster[0]))
"Covid": Cluster 3
"Coronavirus": Cluster 3
"Corona": Cluster 3
"Virus": Cluster 3
"Pandemic": Cluster 3
"Quarantine": Cluster 3
In [85]:
# see which clusters seem to have the largest values for random words as a sanity check
not_coronavirus_words = ['mexican', 'pasta', 'sushi', 'takeout', 'service', 'chicken', 'ambiance']
for word in not_coronavirus_words:
    if 'word_{}'.format(word) in tfidf_means.columns:
        max_value = tfidf_means['word_{}'.format(word)].max()
        max_cluster = tfidf_means[tfidf_means['word_{}'.format(word)]==max_value].index.to_list()

        print('"{}": Cluster {}'.format(word.capitalize(), max_cluster[0]))
"Mexican": Cluster 10
"Pasta": Cluster 8
"Sushi": Cluster 11
"Takeout": Cluster 3
"Service": Cluster 12
"Chicken": Cluster 9
"Ambiance": Cluster 12

Coronavirus terms appear to cluster together (the cluster number changes each time the KMeans is run, so I've printed it below). For a sanity check, I included some random terms as well in a second check to see where they might cluster. Next, I'll investigate what other words are associated with the coronavirus cluster.

In [86]:
if 'word_coronavirus' in tfidf_means.columns:
    max_value = tfidf_means['word_coronavirus'].max()
    corona_cluster = tfidf_means[tfidf_means['word_coronavirus']==max_value].index.to_list()[0]

    print('Coronavirus cluster: {}'.format(corona_cluster))
Coronavirus cluster: 3
In [87]:
print('Top terms associated with COVID-19-like Terms:')
print([feat.split('word_')[1] for feat in tfidf_means.iloc[corona_cluster].sort_values(ascending=False).head(50).index.to_list()])
Top terms associated with COVID-19-like Terms:
['order', 'food', 'delivery', 'time', 'pick', 'restaurant', 'place', 'delicious', 'come', 'good', 'try', 'support', 'takeout', 'deliver', 'want', 'eat', 'wait', 'great', 'business', 'thank', 'like', 'service', 'minute', 'meal', 'customer', 'tell', 'phone', 'covid', 'home', 'local', 'ask', 'online', 'right', 'rice', 'open', 'ready', 'pickup', 'definitely', 'people', 'chicken', 'amazing', 'know', 'amp', 'dinner', 'sauce', 'salad', 'portion', 'menu', 'price', 'today']
In [88]:
## examine reviews from the corona cluster
# join tfidf to df_short
df_join = df_short.merge(tfidf_df, left_index=True, right_index=True)
In [89]:
# wordcloud of corona cluster
font_path = '/System/Library/Fonts/Supplemental/DIN Condensed Bold.ttf'
from palettable.colorbrewer.sequential import GnBu_9, Reds_8
def color_func(word, font_size, position, orientation, random_state=None, **kwargs):
    if word in coronavirus_words:
        return tuple(Reds_8.colors[random.randint(3,7)])
    return tuple(GnBu_9.colors[random.randint(3,8)])

wc = WordCloud(font_path=font_path, 
               background_color="white", 
               width=1000, 
               height=600,
               max_words=500,
               max_font_size=300, 
               random_state=42)

plt.figure(figsize=(15,15))

wc.generate(str(df_join.loc[df_join['cluster']==corona_cluster, 'clean_review']))
wc.recolor(color_func=color_func, random_state=3)

wc.to_file('covid_wordcloud.png')

plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.show()

Coronavirus-Like Review Sentiment

Now that I've identified a cluster of reviews that associate with coronavirus-like terms, I look at the sentiment of these reviews compared to others.

Return to Top: Modeling of Reviews

Sentiment of the Coronavirus Cluster

Next, I examined how review sentiment was associated with the coronavirus cluster compared to all clusters.

In [90]:
# examine range of VADER scores and review scores
print('Coronavirus Cluster Statistics:')
print('Compound VADER scores:')
print('Mean: {:.4f}'.format(df_join.loc[df_join['cluster']==corona_cluster, 'compound'].mean()))
print('Low: {}; High: {}'.format(df_join.loc[df_join['cluster']==corona_cluster, 'compound'].min(), df_join.loc[df_join['cluster']==corona_cluster, 'compound'].max()))
print('---')
print('Positive VADER scores:')
print('Mean: {:.4f}'.format(df_join.loc[df_join['cluster']==corona_cluster, 'pos'].mean()))
print('Low: {}; High: {}'.format(df_join.loc[df_join['cluster']==corona_cluster, 'pos'].min(), df_join.loc[df_join['cluster']==corona_cluster, 'pos'].max()))
print('---')
print('Negative VADER scores:')
print('Mean: {:.4f}'.format(df_join.loc[df_join['cluster']==corona_cluster, 'neg'].mean()))
print('Low: {}; High: {}'.format(df_join.loc[df_join['cluster']==corona_cluster, 'neg'].min(), df_join.loc[df_join['cluster']==corona_cluster, 'neg'].max()))
print('\n---')
print('Overall Statistics:')
print('Compound VADER scores:')
print('Mean: {:.4f}'.format(df_join['compound'].mean()))
print('Low: {}; High: {}'.format(df_join['compound'].min(), df_join['compound'].max()))
print('---')
print('Positive VADER scores:')
print('Mean: {:.4f}'.format(df_join['pos'].mean()))
print('Low: {}; High: {}'.format(df_join['pos'].min(), df_join['pos'].max()))
print('---')
print('Negative VADER scores:')
print('Mean: {:.4f}'.format(df_join['neg'].mean()))
print('Low: {}; High: {:.4f}'.format(df_join['neg'].min(), df_join['neg'].max()))
print('\n---')
Coronavirus Cluster Statistics:
Compound VADER scores:
Mean: 0.5703
Low: -0.9866; High: 0.9987
---
Positive VADER scores:
Mean: 0.2844
Low: 0.0; High: 0.765
---
Negative VADER scores:
Mean: 0.0866
Low: 0.0; High: 0.495

---
Overall Statistics:
Compound VADER scores:
Mean: 0.7883
Low: -0.9866; High: 0.9992
---
Positive VADER scores:
Mean: 0.3812
Low: 0.0; High: 0.872
---
Negative VADER scores:
Mean: 0.0558
Low: 0.0; High: 0.6890

---
In [91]:
# VADER scores for cluster corona_cluster v all other clusters
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

sns.boxplot(df_join['compound'], color='palevioletred', ax=axes[0,0])
sns.boxplot(df_join['pos'], color='palevioletred', ax=axes[0,1])
sns.boxplot(df_join['neg'], color='palevioletred', ax=axes[0,2])

axes[0,0].set_xlabel('Overall Compound VADER Score')
axes[0,1].set_xlabel('Overall Positive VADER Score')
axes[0,2].set_xlabel('Overall Negative VADER Score')

sns.boxplot(df_join.loc[df_join['cluster']==corona_cluster, 'compound'], color='mediumseagreen', ax=axes[1,0])
sns.boxplot(df_join.loc[df_join['cluster']==corona_cluster, 'pos'], color='mediumseagreen', ax=axes[1,1])
sns.boxplot(df_join.loc[df_join['cluster']==corona_cluster, 'neg'], color='mediumseagreen', ax=axes[1,2])

axes[1,0].set_xlabel('Coronavirus Cluster \nCompound VADER Score')
axes[1,1].set_xlabel('Coronavirus Cluster \nPositive VADER Score')
axes[1,2].set_xlabel('Coronavirus Cluster \nNegative VADER Score')

# same axes - compound ranges -1 to 1; pos and neg range 0 to 1
axes[0,0].set_xlim(-1,1)
axes[1,0].set_xlim(-1,1)
axes[0,1].set_xlim(0,1)
axes[1,1].set_xlim(0,1)
axes[0,2].set_xlim(0,1)
axes[1,2].set_xlim(0,1)

plt.tight_layout()
In [92]:
# test for significant differences between cluster corona_cluster and all data
# compare compound VADER score 
mean_diff, conf_int, p_low, p_high = bootstrap(df_join['compound'], df_join.loc[df_join['cluster']==corona_cluster, 'compound'])
if p_low < 0.05:
    print('The compound VADER score significantly decreased by a mean difference of {:.3f}; p={}'.format(np.abs(mean_diff), p_low))
elif p_high <0.05:    
    print('The compound VADER score significantly increased by a mean difference of {:.3f}; p={}'.format(np.abs(mean_diff), p_high))
else:
    print('No significant change in compound VADER score; p={}'.format(p_low))
    
# compare pos VADER score
mean_diff, conf_int, p_low, p_high = bootstrap(df_join['pos'], df_join.loc[df_join['cluster']==corona_cluster, 'pos'])
if p_low < 0.05:
    print('The positive VADER score significantly decreased by a mean difference of {:.3f}; p={}'.format(np.abs(mean_diff), p_low))
elif p_high <0.05:    
    print('The positive VADER score significantly increased by a mean difference of {:.3f}; p={}'.format(np.abs(mean_diff), p_high))
else:
    print('No significant change in positive VADER score; p={}'.format(p_low))
    
# compare neg VADER score 
mean_diff, conf_int, p_low, p_high = bootstrap(df_join['neg'], df_join.loc[df_join['cluster']==corona_cluster, 'neg'])
if p_low < 0.05:
    print('The negative VADER score significantly decreased by a mean difference of {:.3f}; p={}'.format(np.abs(mean_diff), p_low))
elif p_high <0.05:    
    print('The negative VADER score significantly increased by a mean difference of {:.3f}; p={}'.format(np.abs(mean_diff), p_high))
else:
    print('No significant change in negative VADER score; p={}'.format(p_low))
The compound VADER score significantly decreased by a mean difference of 0.218; p=0.0
The positive VADER score significantly decreased by a mean difference of 0.097; p=0.0
The negative VADER score significantly increased by a mean difference of 0.031; p=0.0

In the coronavirus cluster, we see a significant decrease in positive sentiment, increase in negative sentiment and decrease in compound (overall) sentment compared to the whole group. We can confirm that VADER got it right using the scores left by reviewers (review stars).

In [93]:
plt.figure(figsize=(8,5))

sns.barplot(df_join['score'].value_counts(normalize=True).sort_index().index,
            df_join['score'].value_counts(normalize=True).sort_index(),
            alpha=0.5, color='palevioletred', label='Overall')
sns.barplot(df_join.loc[df_join['cluster']==corona_cluster, 'score'].value_counts(normalize=True).sort_index().index,
            df_join.loc[df_join['cluster']==corona_cluster, 'score'].value_counts(normalize=True).sort_index(),
            alpha=0.5, color='mediumseagreen', label='Coronavirus Cluster')
plt.title('Distribution of Review Stars Overall v. Coronavirus Cluster')
plt.xlabel('Review Stars')
plt.ylabel('Proportion of Stars')
plt.legend()
plt.tight_layout()
plt.savefig('corona_cluster_review_stars.jpg', dpi=500)

Here, we see an increase in the proportion of 1-star reviews in the coronavirus cluster and a decrease in 4- and 5-star reviews.

In [94]:
# 1-star review corona cluster wordcloud
font_path = '/System/Library/Fonts/Supplemental/DIN Condensed Bold.ttf'
def color_func(word, font_size, position, orientation, random_state=None, **kwargs):
    return tuple(Reds_8.colors[random.randint(2,7)])

wc = WordCloud(font_path=font_path, 
               background_color="white", 
               width=1000, 
               height=600,
               max_words=500,
               max_font_size=300, 
               random_state=42)

plt.figure(figsize=(15,15))

wc.generate(str(df_join.loc[(df_join['cluster']==corona_cluster) &
                            (df_join['score']==1), 'clean_review']))
wc.recolor(color_func=color_func, random_state=3)

wc.to_file('1_star_covid_wordcloud.png')
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.show()
In [95]:
# 5 star corona cluster review wordcloud
font_path = '/System/Library/Fonts/Supplemental/DIN Condensed Bold.ttf'
from palettable.colorbrewer.sequential import Greens_8
def color_func(word, font_size, position, orientation, random_state=None, **kwargs):
    return tuple(Greens_8.colors[random.randint(2,7)])

wc = WordCloud(font_path=font_path, 
               background_color="white", 
               width=1000, 
               height=600,
               max_words=500,
               max_font_size=300, 
               random_state=42)

plt.figure(figsize=(15,15))

wc.generate(str(df_join.loc[(df_join['cluster']==corona_cluster) &
                            (df_join['score']==5), 'clean_review']))
wc.recolor(color_func=color_func, random_state=3)

wc.to_file('5_star_covid_wordcloud.png')
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.show()

Comparison of Sentiment by Cluster

I compare the sentiment of the coronavirus-like cluster to all individual clusters.

Return to Top: Coronavirus-Like Review Sentiment

In [96]:
colors = ['mediumseagreen' if i==corona_cluster else 'palevioletred' for i in range(13)]
cluster_palette = dict(zip(range(13), colors))
In [97]:
cluster_compound_vol = df_join.groupby('cluster').count()['compound']

fig, axs = plt.subplots(2, figsize=(15,10), sharex=True)
sns.boxplot(x='cluster', y='compound', data=df_join, 
            showfliers=False, ax=axs[0], palette=cluster_palette)
sns.barplot(x=cluster_compound_vol.index, y=cluster_compound_vol, 
            ax=axs[1], ec='k', palette=cluster_palette)

axs[0].set_title('Compound VADER Scores by Cluster')
axs[0].set_xlabel('')
axs[0].set_ylabel('Compound VADER Score')
axs[1].set_xlabel('Cluster')
axs[1].set_ylabel('Number of Reviews')

plt.savefig('corona_cluster_compound_vader.jpg', dpi=500)
plt.tight_layout()
In [98]:
cluster_compound_bootstrap = bootstrap_all(df_join['cluster'].unique(), df_join, 'cluster', 'compound')

In [99]:
cluster_compound_bootstrap[cluster_compound_bootstrap['p-value_low']<0.05].sort_values('Mean', ascending=False)
Out[99]:
Variable Mean Mean of Others Mean Difference 95% CI p-value_high p-value_low
11 6 0.768914 0.790841 -0.021927 [-0.04840749345933512, 0.0032481048567610167] 0.9503 0.0497
6 0 0.735348 0.810777 -0.075429 [-0.09269807241195972, -0.05879886757125972] 1.0000 0.0000
8 3 0.570321 0.807377 -0.237057 [-0.27572792773638477, -0.198095595008483] 1.0000 0.0000

We can see that the coronavirus cluster (colored in green) has a significantly lower compound VADER score than the other clusters; not only is this difference significant, but it's extreme, with a drop of ~0.24.

In [100]:
cluster_compound_vol = df_join.groupby('cluster').count()['pos']

fig, axs = plt.subplots(2, figsize=(15,10), sharex=True)
sns.boxplot(x='cluster', y='pos', data=df_join, 
            showfliers=False, ax=axs[0], palette=cluster_palette)
sns.barplot(x=cluster_compound_vol.index, y=cluster_compound_vol, 
            ax=axs[1], ec='k', palette=cluster_palette)

axs[0].set_title('Positive VADER Scores by Cluster')
axs[0].set_xlabel('')
axs[0].set_ylabel('Positive VADER Score')
axs[1].set_xlabel('Cluster')
axs[1].set_ylabel('Number of Reviews')

plt.savefig('corona_cluster_pos_vader.jpg', dpi=500)
plt.tight_layout()
In [101]:
cluster_pos_bootstrap = bootstrap_all(df_join['cluster'].unique(), df_join, 'cluster', 'pos')

In [102]:
cluster_pos_bootstrap[cluster_pos_bootstrap['p-value_low']<0.05].sort_values('Mean', ascending=False)
Out[102]:
Variable Mean Mean of Others Mean Difference 95% CI p-value_high p-value_low
1 9 0.358224 0.383031 -0.024807 [-0.03440399241927935, -0.015356211757273303] 1.0 0.0
4 5 0.348563 0.383247 -0.034684 [-0.04475145625533093, -0.024522258276104703] 1.0 0.0
6 0 0.343410 0.397200 -0.053790 [-0.05965073812392859, -0.04789274822962352] 1.0 0.0
11 6 0.312390 0.390070 -0.077680 [-0.08477799303346391, -0.07039984300380092] 1.0 0.0
8 3 0.284360 0.389655 -0.105294 [-0.11523236903912609, -0.09562735946312548] 1.0 0.0

Again, we see that the coronavirus-like cluster has a significantly lower positive VADER score than the other clusters.

In [103]:
cluster_compound_vol = df_join.groupby('cluster').count()['neg']

fig, axs = plt.subplots(2, figsize=(15,10), sharex=True)
sns.boxplot(x='cluster', y='neg', data=df_join, 
            showfliers=False, ax=axs[0], palette=cluster_palette)
sns.barplot(x=cluster_compound_vol.index, y=cluster_compound_vol, 
            ax=axs[1], ec='k', palette=cluster_palette)

axs[0].set_title('Negative VADER Scores by Cluster')
axs[0].set_xlabel('')
axs[0].set_ylabel('Negative VADER Score')
axs[1].set_xlabel('Cluster')
axs[1].set_ylabel('Number of Reviews')

plt.tight_layout()
In [104]:
cluster_neg_bootstrap = bootstrap_all(df_join['cluster'].unique(), df_join, 'cluster', 'neg')

In [105]:
cluster_neg_bootstrap[cluster_neg_bootstrap['p-value_high']<0.05].sort_values('Mean', ascending=False)
Out[105]:
Variable Mean Mean of Others Mean Difference 95% CI p-value_high p-value_low
8 3 0.086582 0.053060 0.033523 [0.027462615194596487, 0.03980274963183582] 0.0000 1.0000
11 6 0.067091 0.054290 0.012801 [0.008693571527281441, 0.016965500965022494] 0.0000 1.0000
6 0 0.066121 0.051361 0.014759 [0.011487738505179581, 0.018037385036719197] 0.0000 1.0000
9 1 0.065238 0.055457 0.009781 [0.000981606120026473, 0.018856122839270236] 0.0186 0.9814

Here, we see that the coronavirus-like cluster has a significantly higher negative VADER score than the other clusters.

Sample Reviews for the Coronavirus-Like Cluster

Return to Top: Coronavirus-Like Review Sentiment

In [106]:
# 5 reviews with lowest compound scores in coronavirus cluster
# the first time i ran this, it was cluster 12, so thats where all the 12s are coming from
# create list of reviews listed so we can exclude them easily later
first_neg_reviews = []
for idx, row in df_join[(df_join['cluster']==corona_cluster)].sort_values('compound').head(5).iterrows():
    first_neg_reviews.append(idx)
    print('\n---City: {}---'.format(row['city_x']))
    print('Restaurant Name: {}'.format(row['name']))
    print('Review Score: {}'.format(row['score']))
    print('Review Date: {}\n'.format(row['publish_date']))
    print(row['description'])
---City: Passaic, New Jersey---
Restaurant Name: The Cheesecake Factory
Review Score: 1.0
Review Date: 2020-04-18 00:00:00

Normally cheesecake is fine besides them forgetting cake or items on pick up which has been an issue but today is different. I placed an order online for curbside pickup, when you go onto the site it says something along the lines of "all orders will be curbside pick up and you will be texted when your order is ready" So I place my order and the message on my order as well as in email form says come inside to bakery counter, upon several calls to clarify during the pandemic no one answered as per their usual unprofessionalism/poor customer service, after a while I got a fax machine sound so I hung up. Upon arriving to a load of cars I go inside to way too many people probably also confused to what to do. I wait in a long line to get to the front and after 20 minutes they tell me they can't find the order, I tell them what it is and they have no recollection and tell me to tell them what the order is and have to remake it. By now I'm furious and after a while they finally tell me to wait in my car and they will bring it to me. During all of this one of the runners had no mask on and kept taking it off. This is piss poor service on a regular day but on take out only during a PANDEMIC not only did the staff, management and corporate all fail but they failed miserably to protect the lives of their patrons during this terribly hard time. 

I am an essential worker on the frontlines fighting corona so before you say it's hard on everyone blah blah I am there and way deeper into this than sitting in. Restaurant making mistakes that CAN and SHOULD be avoided to protect lives. This is not joke, not a game and these things can not happen while hundreds are losing their lives a day in this area, fix it or close down until after this is over so you can go back to screwing up orders, not now. 

Avoid this place like the plague AT LEAST until this is over, they have lost my business forever. Anyone familiar w me or my reviews knows i typically never leave a 1 Star or write something like this. Awful.

---City: Union, New Jersey---
Restaurant Name: Cathay 22
Review Score: 1.0
Review Date: 2020-04-09 00:00:00

Unfortunately, this is the lowest rating I can give. But if I were to give lower I 100% would give a -5. The man answering the phone was rude, sarcastic, condescending, and belittling. The smoked duck was clearly reheated multiple times. The meat was extremely tough and drowned in soy sauce and there was no moisture in the meat otherwise, plus no pancakes were provided. The egg rolls were extremely greasy and the shrimp was under cooked inside of it. The fried rice was just white rice with vegetables and vegetable oil. Additionally, the spare ribs were under cooked and greasy and soaked in nasty water. This was despite them being ordered well done. Not only that, but when there was a problem with our order and we called back, the same man who answered the phone previously was again rude and being extremely stubborn and unreasonable. This establishment is an abomination, and I will never eat this disgusting and tremendously horrible food ever again for as long as I live.

---City: Philadelphia, Pennsylvania---
Restaurant Name: Han Dynasty
Review Score: 2.0
Review Date: 2020-03-27 00:00:00

COVID TAKEOUT REVIEW
Businesses are definitely struggling during these times and it saddens me to write anyone a negative review. However, I ordered fried rice from this location last night and had extremely unpleasant food poisoning since. The rice tasted stale/old and definitely funny, but I ignored it and chalked it up to weird flavoring. Big mistake. When rice specifically  is kept too long, it tends to breed a specific bacteria (b cereus) resulting in very painful food poisoning. It must be very hard to keep a business going right now, but old food, ESPECIALLY rice is not acceptable. Very disappointed in last nights meal. Won't be ordering from here.

---City: Union, New Jersey---
Restaurant Name: Picante
Review Score: 1.0
Review Date: 2020-03-07 00:00:00

Worst customer service ever, the employees along with the owner are the rudest and the most ignorant people that Ive encountered in my life, I simply called in for an order of a Veggie Nachos and a beef Nachos I specifically Asked for Beef! We get home to find out that they messed up our ordered and gave us Pork Nachos, I called to ask if I could drive by and pick up another Veggie or Beef Nachos! I explained to the owner that being Muslim we can't eat pork at any circumstances! But she was very rude and refused to change our order or to give us a refund!We end up throwing both away! I was very upset at the way she spoke to me!  it's a shame that people still ignorant I'm 2020.

---City: Wayne, Michigan---
Restaurant Name: Texas Roadhouse
Review Score: 1.0
Review Date: 2020-03-08 00:00:00

sat the 7th ordered dinner to go ordered the bbq chicken for dinner girl checked our order before we left found an item missing at 430 pm started dinner found I got a pork chop instead of chicken bummer called the rest and three times put on hold until brie tells me the manager probably wont handle my call or complaint because they were busy explained to her I had already paid and that was a poor excuse asked her don't put me on hold and tell a manager im on the phone miracle there was one in shouting seriously yelling in my ear to meagen that she doesnt have time for me Meaghan says shes a manager and sorry for the inconveinance could I come back I live 15 miles away and you checked my order maybe she says Tuesday ill be taken care of to check my email call if I haven't received a voucher wth doubt ill see it doubt ill go back just terrible btw my take out order that day was number 3

Several of the negative reviews do not explicitly mention COVID-19 and could likely be from before those restaurants switched business strategies; however, several of the others have consistent themes mentioning price, food quality, and service.

In [107]:
# 5 reviews with highest compound scores overall
first_pos_reviews = []
for idx, row in df_join[(df_join['cluster']==corona_cluster)].sort_values('compound', ascending=False).head(5).iterrows():
    first_pos_reviews.append(idx)
    print('\n---City: {}---'.format(row['city_x']))
    print('Restaurant Name: {}'.format(row['name']))
    print('Review Score: {}'.format(row['score']))
    print('Review Date: {}\n'.format(row['publish_date']))
    print(row['description'])
---City: Macomb, Michigan---
Restaurant Name: P.F. Chang's
Review Score: 4.0
Review Date: 2020-04-13 00:00:00

I was excited weeks ago to participate with "Yelp's Big Night-In" Event on April 11.  Thanks to our Yelp coordinator, Annette J and others it was the perfect way of "Not Cooking" on a Saturday night.  What a great idea to help support any local business in my area that was open for carryout or delivery due to the pandemic COVID-19.  
* FYI: I always like to enjoy 3 of my favorite Chinese food places yet only this one is currently Open. The other 2 are temporarily closed so I decided to go with PF. 

So here's my "Dine-In" Review:
I decided on P.F. Chang's due to my craving for Chinese and also to help and assist this well-known local business on M-59 (Hall Rd) near Partridge Creek Mall. Instead of my phone app I found it quicker and easy to place my order online with my laptop so to view their easy menu options.  Also the website for P.F. Chang's was already set up with an area called "Online Orders" so click there.  It was detailed and I completed my order in 15 minutes. My timeline to pick up was 40 minutes and I was given a time as well which was fair and acceptable.

It was cheaper by $10.00 to order the Family Meal For 2 plus a choice of soup and a noodle entree was included.
I ordered as a side of the Hand-Folded (6) Crab Wontons with spicy plum sauce for $9.00.  They were tasty and crunchy but not enough crab filling for Me.  

My Family Meal for 2:  $32.00

1 Chicken Lettuce Wrap was plenty with 8 leaves of lettuce as always being their signature dish.
1 Wonton Soup was okay with plenty of wontons, mushrooms, onions, water chestnuts and spinach but tasted only like beef broth, compared to other Chinese places in regards to the main flavor. No crackers or noodles to add to the soup.  Not sure about that since the menu online did not show included or ask if I requested them. ?

1 Noodle choice was Chicken Lo Mein in a large 6-oz bowl. 
My eyes were delighted to see the crunchy vegetables and chunks of chicken as well. My Taste buds were happy with the flavor of spices in the bowl. (Leftovers were enjoyed for my lunch today on Monday.)

1 Mongolian Beef was prepared with sliced flank steak seared a soy-glazed sauce and garlic.  It was cooked perfect and tender with a delicious flavor to enjoy with white rice.  On top were strips of green onions which were bland and not eye appealing.  Maybe green onion slices as in the soup would of complimented the entree.

The whole process from online ordering to curbside pick up was very organized and no set-backs or issues to complain about.
I arrived at 6:25 pm to pick up at 6:28 time and there were 3 vehicles ahead of Me but the speed of pick up made the line move fast with great employees handling the process of curbside. 

Payment:  Instead of a credit card online the Restaurant asked for Payment Directly upon carry out or curbside pick up.
I did have to get out of my SUV after calling that I was in line to pick up, to wait for my paperwork then pay before receiving my bag of food. I also printed out my order of confirmation to verify. 

Employee at the back door for payment and carry out was "Michaela."
I will always ask a person's name when I am pleased with service or their attitude, so I can acknowledge them in a review or phone call.
*I also gave her a cash tip for her speed!*

Their "Careful Packaging" of my whole carry-out order was Fantastic with great sealed containers plus no spills!  My food was still Hot also after my 10 minute speedy drive home!
(Heat cranked on the floor too) lol

Summary:  A "Fun Yelp Dine-In" Event to join and experience tasty food, fast online ordering, plus quick easy pick up.  Most of all no need to dress up... plus no cooking of pots and pans in my kitchen on a Saturday night. 

It was worth the time &amp; money to help a local business who was honored to serve Me during a difficult time. (This comment line was actually printed on my order received email).  :-)

Will there be another Yelp Dine-In this month on a different day or night?
I certainly hope so since the Governor of Michigan "Stay At Home" rule is currently in effect till May 1, 2020.

---City: Fairfield, Connecticut---
Restaurant Name: Southport Diner
Review Score: 5.0
Review Date: 2020-03-04 00:00:00

When Healthy meets delicious!!! 
Words are not enough to express how good, delicious and healthy the food that is being served in this newly reinvented modern and beautiful diner. 
I love comfort food, I mean who doesn't?! 
On the Glorious Sunday morning I decided to treat myself and not spent time cooking but have the breakfast delivered to me. I use seamless to place my order and as I was browsing the menu I was delightfully impressed by how many healthy choices this diner has to offer. I decided on the power boost omelette that comes with grilled chicken and avocado. As I was waiting for my order I received a call from Tony the owner of the diner  letting me know that my order is ready to be picked up and He kindly suggested to use Uber eats instead for my future orders as they pick up and deliver the food faster while it's hot compare to other food delivery services. How thoughtful and caring?! As my food arrived I unwrapped and OMG I can tell by the looks of it that it's going to be an epic meal but when I took my first bite of the omelette I wanted to cry of a joy as it tasted so freaking good and delicious. The chicken was cooked to perfection, I Mean It. Overall I am so delighted and happy about my experience with this restaurant and I can't emphasize enough how delicious and healthy the food is. When you get a good news you want to share it with everyone so here it is EVERYONE who enjoys good and healthy food should try Southport Diner!!!!! 
They Rock!!!!

---City: Bergen, New Jersey---
Restaurant Name: Manjal Indian Fusion
Review Score: 5.0
Review Date: 2020-03-14 00:00:00

After ordering from Manjal a few times now, and realizing that each time we get it, the food seemed to just get better and better, I figured it was time to leave this review.

Last night we ordered  Paneer Tikka Masala, Goan Shrimp, Gobi Manchurian, 2 orders of Naan (butter &amp; garlic),  Keema Samosa, and a Mango Lassi (as well as a whole lot of rice)...for two people (yes, we were hungry). Needless to say, this was enough food for us to eat for dinner for 2 nights, and after doing so, we still have a whole lot of rice. Every single thing we ordered was delicious, and we totally enjoyed it two nights in a row. 

The Paneer Tikka Masala...I know you can't go wrong with this dish almost anywhere. It's just...lovable. Manjal's, however, is over the top delicious. Spicier than most I've had before, and with so much magical flavor pushing its already luscious texture to next level tasty. I truly love it.

My husband got the Goan Shrimp, which I didn't have a taste of, but he loved it.

The Keema Samosa was SO good. I've never had a Samosa with meat in it before, but after trying this, I'm pretty convinced that lamb absolutely belongs in a Samosa. Amazing.

I'm a sucker for all things cauliflower, and the Gobi Manchurian did not disappoint. Spicy and sweet and sticky and perfect. Such a great appetizer, and a nice generous portion.

Naan, an absolute necessity for soaking up the amazing sauces from our Tikka Masala and Goan Shrimp, is a treat on its own. Perfectly cooked, and amazingly still perfect when we toasted it to reheat with a little crisp. Love it.

And the Lassi? I swear I'd drink this every day. Just sweet enough, smooth, creamy, refreshing and delicious. Wish I had another today with our leftovers meal!

We will be back: many times. We're lucky to live so close to such an awesome restaurant.

---City: Cook, Illinois---
Restaurant Name: Gorée Cuisine
Review Score: 5.0
Review Date: 2020-04-17 00:00:00

In support of small business and Senegalese culture, I welcomed this place with open arms and left with a full stomach and heart. The restaurant even gave us free plantains to-go! This place is GREAT for group outings and perfect for family-style sharing. And most of all, the people here do their best to greet you with warmness. Our waitress was superb, giving solid recommendations and generous samplings of juices. Oh, and there's easy free street parking, and you likely won't need reservations. 

Chef's recommendations included Dibi Lamb, Yassa Lamb, Yassa Shrimp, Saka Saka, and Maffe. Collectively I felt that these dishes met all of my palate needs. Some were sweeter, some were saltier, and some more spicy but all were mind-blowing. The Yassa shrimp was a flavorful dish packed with "secret spices" complimented by the slightly spicy but very tasty Jollaf rice. The Dibi Lamb was tender, fresh, and not too heavy. The Maffe (which consists of lamb cooked in a creamy peanut butter and tomato sauce with potatoes, carrots, and yams) was the perfect sweet compliment to its saltier counterparts. It was creamy without the guilty aftermath. When we were preparing to leave, the restaurant offered us a free entree and plantains to-go (since a guest had canceled an extra order). We appreciated the gesture. 

To prepare our palates for the entrees, some of us ordered senegalese mint tea while others ordered the freshly made ginger/pineapple juice. Both were healthy and tasty and not overpowering. 

Highlights: order a yassa or dibi dish of some kind as these are highlights. I recommend the lamb or shrimp. Don't forget the plantain appetizers. 
 
Caveat: exceptional food sometimes comes at a small price. The food took longer to prepared than I expected, but I didn't mind it at all. If you're in a hurry, be wary of this. 

Have since recommended this place ample times, and everyone has love this place. Excited to dine here multiple times.

---City: Wayne, Michigan---
Restaurant Name: Ten Yen Restaurant
Review Score: 5.0
Review Date: 2020-03-18 00:00:00

I was visiting a sick relative with ALS in January.  As a surprise, I wanted to buy dinner at his favorite place.  He was craving Chinese food so "Ten Yen" was the delicious choice.   It was easy and helpful to look over the carry out menu they already had at home. Photos of menu are attached.

It was easy to place our order over the phone with William who spoke English well. :-) (humor:  yes at times they talk fast)

He was kind enough to repeat my carry out order back to Me so, a careful efficent guy!
It was also nice to meet him in person at his family business and owner when picking up our food.  Photos attached.

Since we won't be able to sit down in this place due to his sad illness, I took photos of the inside to post as well. "Helper Yelper= Karen!"   There were guests eating in the dining room area as well.  Your welcome.  

The prices are affordable and the portion sizes were huge so different leftovers to enjoy for the next day.  Oh yeah with a smile :-).   
Meals delicious:  Crispy chicken, perfectly cooked shrimp, tender beef, white and fried rice in large containers, plus three tasty sauces put in separate containers.

The soups were delicious and definitely homemade with crunchy noodles!  My favorite was the egg drop plus I tried a small sample of the wonton soup.  An Egg roll was included in each combo meal.  All the food items were packaged tightly in containers then bagged up and easy to travel back unspilled in a cardboard box.

The owner William was very pleasant upon my walking in, paying cash, and thanked Me for coming in and to enjoy my meals while leaving.  He also mentioned "I gave you extra fortune cookies".... Humor or I need some good luck?!  :-)

I would highly recommend this Chinese restaurant when in Westland or Plymouth, Michigan located on Wayne Rd.  You can miss it because its in a mini strip mall.  So drive slow when looking for, or use Waze. It's near a gas station at the traffic light.

P. S. I my fortune cookie was: " So True" and of course photo attached!

Several of the top positive reviews appear to be from prior to lockdown. Just as an aside, I would guess review length is almost definitely associated with review sentiment - at the very least, short reviews are more likely to be neutral, but it seems the positive reviews are very long.

Here, I look again at both negative and positive reviews, but only those left since April 1, 2020. I've excluded reviews we've already seen.

In [108]:
newest_reviews = df_join[(df_join['publish_date']>'2020-04-01') & (~df_join.index.isin(first_neg_reviews))]
# 10 reviews with highest compound scores in cluster 12 since April 1
for idx, row in newest_reviews[(newest_reviews['cluster']==corona_cluster)].sort_values('compound').head(5).iterrows():
    print('\n---City: {}---'.format(row['city_x']))
    print('Restaurant Name: {}'.format(row['name']))
    print('Review Score: {}'.format(row['score']))
    print('Review Date: {}\n'.format(row['publish_date']))
    print(row['description'])
---City: Essex, New Jersey---
Restaurant Name: Cuban Pete's
Review Score: 1.0
Review Date: 2020-04-18 00:00:00

What horrible management! First time I ordered from here (on seamless) and it took almost two hrs! My order totaled almost $100 I understand the hardship right now with  the virus and everything happening which is why I wasn't rude but when I called to see what the issue was the person didn't even ask for my name to see how much longer my order would be. He was so nasty and just told me the driver needs to come in to get the food(when she already called me and told me she went in twice) when I called back to ask after the third time the driver went in the guy said the driver just went out "good luck" ....what the hell does that even mean! I will never order from here again. Fine the food took two hrs no prob but the guy with his rudeness was unacceptable.

---City: New Haven, Connecticut---
Restaurant Name: Hook & Reel Cajun Seafood & Bar
Review Score: 2.0
Review Date: 2020-04-18 00:00:00

This is the 4th time I'm coming to hook and reel in orange ct &amp; this visit was horrible! We ordered 2 reels catch &amp; 3 make your own bags. The problem arrived when we ordered and waited 30 mins before returning to pick up our food. Once inside we waited for a while to pay which is okay knowing the crisis with COVID-19 is going on so they are overwhelmed with orders. Our order was separated in 2 bags. The first bag was missing one of the reels catch so we brought it back in the restaurant to show them the mistake. The manager proceeded to go through the bag and bring one of the duplicated (make your own bags ) to the kitchen. Which is not sanitary to bring something back to the kitchen if it already left the restaurant to the car. She brings the ( remade ) reels catch back pretty quick which raised red flags. I proceeded to ask her if she just added to the bag she brought back and she argued with us that is not what she did. There was no true apology for the mistake or care to be honest. This was the worst take out service I've ever experienced. Especially for paying over $100 for food. Highly disappointed!

---City: Union, New Jersey---
Restaurant Name: Don Alex Restaurant
Review Score: 1.0
Review Date: 2020-04-09 00:00:00

We really wanted Peruvian food and figured we try a new place. I called to ask if open and if there was a menu online. Woman who answered didn't speak English and didn't understand me. 

Decided to take the ride and order in person. Walk in and only 1 person working the front which makes sense during the quarantine! The second I walked in he starts talking to me in Spanish. When he finally stops I tell him I can't speak Spanish. He looks at me sadly and says "oh". Now 2 different people and 2 don't speak English. I have to point to my items on the menu that i want. Food is made quickly. 

I ordered the arroz chaufa beef and chicken. The price is similar to everywhere else $15 but you get no where near as much food and it might be the worst Peruvian fried rice i ever had. The container was the size of a side dish anywhere else. I would say maybe half as much as i have gotten at all other places. The rice was dry, some was burned and hard, there was no flavor and it was very greasy. Barely any chicken, beef or egg. It was basically all rice and scallions! I was so disappointed but it makes sense looking at their rating on yelp and Google both around 3 stars.

Basically overpriced, no flavor, half the size portions of other places and they can't communicate with English speaking potential customers!

---City: Los Angeles, California---
Restaurant Name: Birdies
Review Score: 1.0
Review Date: 2020-04-04 00:00:00

Coronavirus ?!?! They forgot an item in our order. Rude owner blames it in the uber eats driver, said they've been eating customers food out of the bag. During the coronavirus epidemic,  birdies is the only restaurant in los angeles not sealing their togo bags. Containers were open. Disgusting. open bags. Seal your takeout bags, CORONAVIRUS !!!!

---City: Middlesex, Massachusetts---
Restaurant Name: La Tapatia Taqueria
Review Score: 1.0
Review Date: 2020-04-18 00:00:00

We have ordered here multiple times and they always forget something. Tonight they forgot a burrito in our bag. Went back and got it and it was smothered in sour cream even though we said multiple times over the phone NO sour cream and NO cheese. Some people have dairy allergies. AND there was a HAIR in another burrito. Disgusting. Never ordering from here again these people are incompetent.
In [109]:
newest_reviews = df_join[(df_join['publish_date']>'2020-04-01') & (~df_join.index.isin(first_pos_reviews))]
# 10 reviews with highest compound scores in cluster 12 since April 1
for idx, row in newest_reviews[(newest_reviews['cluster']==corona_cluster)].sort_values('compound', ascending=False).head(5).iterrows():
    print('\n---City: {}---'.format(row['city_x']))
    print('Restaurant Name: {}'.format(row['name']))
    print('Review Score: {}'.format(row['score']))
    print('Review Date: {}\n'.format(row['publish_date']))
    print(row['description'])
---City: Passaic, New Jersey---
Restaurant Name: Cumin N Eat
Review Score: 5.0
Review Date: 2020-04-11 00:00:00

We have been looking for a good Indian restaurant in our area. Welcome to Secaucus! Thank you so much for being open at this time and for your delicious food and your quick delivery. We had a lamb appetizer , vegetable korma, vegetable biryani, and aloo gobi- cauliflower dishes.  Wow!!
There's nothing like the robust spice of the Indian cuisine.  My palette is still alive with wonderful spicing. 

The lamb was perfectly stewed and spiced and was tender and moist.  We found the sauce of the vegetable korma slightly too thick and a smaller portion by comparison to the other entree but enjoy the taste very much.  The rice dish was alive with a cacophony of spices !! Wow!!  It was the most flavorful and my favorite dish.  The cauliflower was gifted to us as was some raita and a ginger sweet carrot desert.  That was very kind.    The spice and heat on the cauliflower was perfect!  I usually like my raita simply prepared, their version had red pepper / onions, interesting choices.  I like the taste but I think it's over the top, just yogurt and cucumber works for me.   I am allergic to carrots so I didn't eat the desert.   But that was nice of them to include it in my order. 

The delivery was prompt. The transaction of payment and tip were done in advance.  The driver had a mask and gloves in when they dropped the food off at our door.  The driver phoned me when he arrived.  It was safe and easy and I appreciated their attention to our CDC Guidelines. 

I can't wait to order again.

---City: Cook, Illinois---
Restaurant Name: il Porcellino
Review Score: 5.0
Review Date: 2020-04-02 00:00:00

We ordered delivery to support local business during the pandemic, so for less cooking. It is great! I would go there in person when the restaurant re-opens.

Cheese bread is great, very fluffy and tasty, even the marinara sauce is great! I like it better than the focaccia, which is very crunchy just because I like soft and fluffy bread.

The orecchiette is amazing! It does have garlic and I don't like garlic, but they left them in large chunk so it is easy to pick out. I would definitely order this every time I go or order delivery.

The spaghetti and meatball is good, just what you expect they would taste like, the meatball is soft and the meat is good. 

The Rigatoni with chicken is good too.

They all came in standard size takeout container. I didn't have all of the rigatoni on the plate because I saved some for tomorrow :) 

The asparagus is exactly 8 asparagus for $7.95, but it does taste good. I just wished there would be more.

Overall this was a very pleasant experience and I would definitely go again!

---City: Rockland, New York---
Restaurant Name: Da Nina's Italian Restaurant
Review Score: 5.0
Review Date: 2020-04-04 00:00:00

Ordered takeout and had a delicious dinner delivered. Food is just as great as when you went there to eat it. I wish for safety and health for all those still working there. Looking forward to dining in again, but this was a great substitute for the time being!


Delicious Italian food! As a Vegan, it's hard to find a place that takes the time to look at their menu and create something for you. Our server, Abel, was fantastic. He suggested the Gnocchi dish with plain tomato sauce (the menu version comes prepared with meat and cheese). I didn't want to come up with a bunch of tweaks to a menu item, ask a ton of questions and be afraid that they'd take me seriously. I was ready to order 4 sides to create a meal, but Abel's suggestion made my dinner special. (Speaking of sides, their spinach dish is the best I've ever had. Anywhere. The spinach itself has a grilled flavor, and is sautéed with olive oil and garlic. Their sautéed hot peppers were sooo good too- they're hot temperature-wise and heat-wise. Not recommended for) I dined with a non-Vegan, and he loved his food too. The restaurant is cozy and warm, and the staff is wonderful and welcoming. Can't wait to go back!

---City: Los Angeles, California---
Restaurant Name: Three Borders Brunch & Grill
Review Score: 5.0
Review Date: 2020-04-04 00:00:00

Today we stopped by Three Borders Brunch &amp; Grill for Grab n' Go.  As this pandemic continues Los Angeles restaurants can only do Grab n' Go.   Hence we called in our order, picked it up about 20 minutes later.  The food here is consistently 5 star quality.  The warm owner, staff working here remain 5 star as well.   It is really important we support exceptional independent restaurants like Three Borders during these tough times.  Typically they are packed on a Saturday yet today we appeared to be the only people picking up food.  So sad because this is one of the absolute best restaurants in the area.

Shrimp Tostada - Lovely fresh warm tostadas covered with tons of shrimp, flawlessly prepared vegetables.  There was enough food here for 2.  At $14 it is a solid deal.  Look at my photos and you shall see.  

Tacos Dorado w/Carne Asada - Perfect tacos dorado served warm and crisp.  Once again tons of delicious food one can easily share.

Pineapple Agua Fresca - Consistently the absolute best agua fresca in Southern California.  The perk of taking it home?  You can mix it with Bacardi rum, create a fabulous cocktail.  I'm drinking it right now as I type this review.

Chips w/Guacamole - Delicious and on the house!  They even threw in a second agua fresca.  Wow!  This is customer service and Latin American hospitality at it's very best.  This is how you keep folks like us coming back again (and again).

This is amazing grab n' go food at times like this.  For a bit over $30 we got tons of food.  Will I be back?  Dumb question.  I come here all the time and today was as good as it gets.

If you can please go out of your way to support the delicious independent restaurants in Los Angeles.  They need to survive so that when this pandemic is over we can once again frequent them.

---City: Cook, Illinois---
Restaurant Name: Butterdough
Review Score: 5.0
Review Date: 2020-04-18 00:00:00

OUTSTANDING!!!! I'm all about supporting small businesses during this nightmare and this place deserves all the business possible. I searched for coffee on doordash and found this absolute gem. I ordered a variety and I can't wait to order from here again.
 2 for $4 Bacon Egg &amp; Cheese on Housemade Biscuits: Take the price up to $6 and I'm still all in. The biscuit had the rustic outside while the inside was as soft and buttery as an abuelitas hug. The bacon and egg were cooked to perfection along with the right amount of quality cheese.  Seriously a top 3 breakfast sandwich and as a fat guy I should know. 
Jamon y Queso Croissant: Quality ham off the bone, perfectly melted cheese on a masterpiece of a croissant.  
Canelada: Another beautiful (yes I called food beautiful) croissant with the right amount of cinnamon and sweetness that makes me wish I ordered more. 
Strawberry Glazed Donas: NOTHING BEATS A FRESH DONUT AND THIS DONUT IS LATE 80'S/EARLY 90'S RAPPER FRESH. BRAVO!!!!
Cold Brew w/Mexican Mocha: Four sips in and I was ready to time travel and stop the coronavirus.  Perfect blend of Coc/cin/brown sugar. You may think it was the coffee that caused this long review and maybe it helped but the quality top to bottom was the real inspiration.  
WANT TO FEEL SPECIAL, THEN ORDER FROM HERE!!!!

Geographical Distribution of Coronavirus-Like Reviews

To look at city by cluster, below, I show the proportions of reviews by city (normalized) overall and by the coronavirus cluster.

Return to Top: Coronavirus-Like Review Sentiment

In [110]:
plt.figure(figsize=(12,8))

sns.barplot(df_join['city_x'].value_counts(normalize=True).sort_index().index,
            df_join['city_x'].value_counts(normalize=True).sort_index(),
            alpha=0.5, color='palevioletred', label='Overall')
sns.barplot(df_join.loc[df_join['cluster']==corona_cluster, 'city_x'].value_counts(normalize=True).sort_index().index,
            df_join.loc[df_join['cluster']==corona_cluster, 'city_x'].value_counts(normalize=True).sort_index(),
            alpha=0.5, color='mediumseagreen', label='Coronavirus Cluster')
plt.title('Distribution of Review Locations Overall v. Coronavirus Cluster')
plt.xlabel('City')
plt.ylabel('Proportion of Reviews')
plt.xticks(rotation=90)
plt.legend()
plt.tight_layout()

It appears that COVID-19-like reviews are less likely to come from Jefferson, LA (New Orleans); NYC; New Haven; and Philadelphia. I hypothesize that this decrease may be due to higher impacts of COVID-19 in particularly dense areas, especially in NYC, where eating out may have decreased as a result of the high impacts of the pandemic. Social distancing measures similarly may make it more difficult to get take-out in densely populated areas, whereas in more suburban areas, no-contact takeout and delivery could be more feasible.


Modeling Conclusions

Using clustering, we're able to see that coronavirus-like terms do cluster together, along with terms such as "order", "time", "delivery", "support", "takeout", "service", and even "online" - which makes sense, given the drastically changed business model restaurants across the U.S. have been forced to adopt since early March.

This cluster has significantly worse sentiment than the other clusters, which I thought initially could be due to words like "pandemic" being classified as negative words by VADER; however, I confirmed lower sentmeint using review stars. Of note, this cluster is easily identified as having lower sentiment visually using a boxplot - the compound VADER score is substantially lower than that of other clusters.

Geographically, reviews that cluster with coronavirus-like terms seem to come from counties across the country, but less so from some of the regions most impacted, such as NYC.

Return to Top: Modeling of Reviews

Future Directions

To expand and improve this analysis, I would do the following in the future:

  • Rescrape Yelp data to include the remainder of April to current. The current dataset was scraped in mid-April and does not span more recent weeks.
  • Use cloud computing resources to run KMeans clustering on all available data
  • Explore different numbers of clusters from the KMeans. Normally, cluster number can be determined by observing an elbow at a certain cluster number; however, I didn't observe that here and had to somewhat arbitrarily choose 13 clusters. Exploring different numbers of clusters to see how coronavirus terms cluster would be an interesting next step.
  • Remove restaurant "stop words", such as "place" and "good" - these terms are really common, and removing them might provide more insight into specific clusters.
  • Explore trends in neutral reviews. Throughout this analysis, I look mostly at compound, positive, and negative sentiment, but not neutral sentiment.

Return to Contents